Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | ambrose-paul |
View: | 217 times |
Download: | 0 times |
PHYSICAL DATABASE DESIGN
Physical database design is concerned with issues revolving
around data base implementation:
Implementation design
Database storage, access & location
File organization & constraints
3
Conceptual/
Base table
THE THREE FORMS OF DATA
External
100 ...100 ... 200 ...200 ...
300 ...300 ...
Internal/
Hardware level
These three levels provide logical and physical data independence
4
Cust# Name Address Balance
100 Gordon 110 Oak Street $400
200 Prasad 22 Birch place $2500
300 ………. …………… ….......
Create table
Alter table
Create index
drop index
Facilities
ConceptualConceptualConceptualConceptual
InternalInternalInternalInternal
ExternalExternalExternalExternal
Models
Schemas
File
Organizations
Views
THE THREE TYPES OF MODELS
Create view
Drop view
5
COMPONENTS OF PHYSICAL DESIGN
1. Implementation design
2. Storage, access & distribution strategies
3. File organizations
4. Specifications for integrity constraints (later)
7
IMPLEMENTATION DESIGN
Decide on tables (de-normalization)
Decide on primary and cross reference keys (not discussed further)
Decide on attribute data types (not discussed further)
E.g. fixed vs variable length fields
integer vs double integer
Design reports and forms (not discussed further)
Concerned with taking the results of normalization and designing tables, attributes, data types for implementation.
8
Field Name Data type Description Length Decimals
Prod# Numeric Unique prod code 6 0
Descr Text Short prod
description
25 0
Price Currency Product price 6 2
Denormalization Example (for 1:1)
Parts(Part#, PartName, )
Container (ContainerID, #fin, #needed, Part#)
Parts(Part#, PartName, ContainerID, #fin, #needed)
DECIDING ON TABLES
9
Denormalization is going back in the normal forms to reduce schemaoverhead
DECIDING ON TABLES..
Denormalization Example (for M:N)
ORDERS PRODUCTSAre for
Ord# Ord_dt
Qty
Prod# Descr.
What tables does normalization result in?
10
Orders(ord#, ord_dt, ..)
Product(prod.#, descr, ..)
Orders for prod (prod.#, ord#, qty)
DENORMALIZATION
Orders(ord#, ord_dt, ..)
Product(prod.#, ord#, descr., qty..)
11
COMPONENTS OF PHYSICAL DESIGN..
1. Implementation design
2. Storage and access strategies
3. Distribution strategies
4. File organizations
5. Specifications for integrity constraints (later)
12
STORAGE & ACCESS STRATEGIES
Estimate storage requirements (Volume analysis)
Determine media to be used (not discussed)
Study how data is being acccessed (Usage analysis)
Use these to develop file organization (later)
OBJECTIVES
13
ALSO CALLED VOLUME & USAGE ANALYSIS
Volume and Usage analysis is carried out with a composite usage map.
COMPOSITE USAGE MAP
Used for volume & usage analysis file org.
Superimposed on ER Chart
Attributes are not shown
Shows estimated number of records (volume)
Shows type of access (dotted lines )
A composite usage map is simply an ER chart (without attr),that shows the number of records, and the frequency/pattern with which they are accessed.
14
VOLUME & USAGE ANALYSIS
15
Equipment, Parts and PE tables Equipment: 100;
Parts:12,000; PE: 10,000
20 inquiries per hour to Equipment
300 inquiries per hour on Parts table
70% of these inquiries also need to know Equipment info.
Draw a composite usage map, estimate storage requirements and develop a suitable file organization
ESTIMATING STORAGE REQMTS. FOR PARTS AND EQUIPMENT
7 10 12 2 1 1
EQUIPMENT (Model#, Descr, Mfr., Price, HP, WT) 1 10 12 2
PARTS(Part#, Descr, Mfr, Price) 7 1 1
PE (Model#, Part#, Qty)
18
Equipment table: 7+10+12+2+1+1 = 33 bytes/recordParts table: ??PE table: ??
Total storage requirements = ??
A MORE ELABORATE EXAMPLE
Parts are manufactured parts and purchased partsParts: 1,000; Suppliers:50; Quotations: 2,500
Total of 200 parts inquiries
60 direct inquiries to purchased parts
Of the purchased parts inquiries, 80 are also to
quotation
Of these 80, 70 are to supplier as well.
75 direct queries to supplier
Of these 40 are for quotation
All of these are also for parts
40% 70%
19
ANOTHER EXAMPLE..
PART
MANU-
FACTURED
PURCH-
ASED
SUPPLIER
QUOTA-
TION
Is-a
(1000)
(400) (700)
40% 70%
(2500)
(50)
200
140
60
A COMPOSITE USAGE MAP
75
40 80
70
40
20
80
Note: # of records are in red;the # of accesses are in blue
STORAGE REQUIREMENTS
PART_NO (5)
DESCRIPTION (15)
LOCATION (10)
QUANTITY (1)
RECORD SIZE: 31
FILE SIZE: 31 * 1100 = 34,300 Bytes
PART TABLE:
Estimated record size 150
Estimated file size 150*2500
= 375,000 Bytes
Note: This is done similarly for other tables.
QUOTATION TABLE:
21
COMPONENTS OF PHYSICAL
DESIGN..1. Implementation design
2. Storage & access strategies
3. Distribution strategies
4. File organizations
5. Specifications for integrity constraints (later)
22
1. Centralized
2. DistributedReplicated (not discussed)
Partitioned
DISTRIBUTION STRATEGIESDistribution strategies are concerned with where the files
are physically located.
23
DISTRIBUTION STRATEGIES
Centralized -- All the data is stored in one physical location.Distributed -- The data is stored in multiple physical locations.Replicated -- The database is duplicated in multiple locations.Partitioned -- The database is divided into “fragments” and each fragment is stored in a different location.
24
CENTRALIZED VS DISTRIBUTED
Which is bottleneck?
Which causes security problems?
Which method may be required for business reasons?
In which setup is data more accessible?
Which provides better performance?
25
CENTRALIZED STRATEGY
Maximize local access, minimize remote access
General Principle:
S1S1 S2S2
S3S3
100100
500500
600600
WHERE SHOULD WE
LOCATE THE DATABASE?
S1, S2 or S3
26
DISTRIBUTED DATABASE
EID Name City
2356 Armstrong LA
3286 Nickerson SF
3356 Forrester MPLS
LA SF MPLS
partitioning
COMPONENTS OF PHYSICAL
DESIGN..1. Implementation design
2. Storage & access strategies
3. Distribution strategies
4. File organizations
5. Specifications for integrity constraints (later)
29
FILE ORGANIZATION
Tracks
Sectors
File 1
Rec. 1,2..
How records are arrangedon secondary storage ormapping between ____ and ______?
30
DATA ACCESS (FYI)
Hard driveIOP
FAT/NTFS
O/SDBMSRequests
Consults
Directory tables
Generates instructions to IOP
Partition
RAM
31
Database storage
FILE ORGANIZATION
Retrieval time (disk access)
Access type (direct, sequential)
Storage space
Maintenance effort
Selection Criteria
32
OVERVIEW OF FILEORGANIZATIONS..
Sequential -- Records are stored one after anotherin pkey sequence.
Hashed -- Record address is determined bysubjecting pkey to hashing algorithm.
Indexed -- Same as sequential except that there is anindex file which places keys into a separate file for ease of searching.
34
THE SEQUENTIAL ORGANIZATION
Records in Pkey sequence
Access only sequential
Insertions/Deletions in sequential order
Simple organization
good for batch updates
Part# Descr. 100 Aux. motors 120 Scrapers 124 Rotors ..... ............
35
THE HASHING ORGANIZATION
A type of file organization where record addressesare generated by subjecting primary keys to a hashingroutine, usually by dividing by a prime#
HashingAlgorithm
Pkey Hash Address
= REM [(Pkey)/(Prime#)]
+Address of StartingBlock
363432
HASHING CONCEPTS
Hashing algorithm
Hash address
Buckets & Bucket size
Slots
Collisions/overflows
Load factor
Search length
1
2
3
4
5
6
7
..
n
Record address = hash address + physical addr
37
Following are important conceptsin hashing:
3432
Pkey = 43Hash address = (43 remainder 7) = 1Record address = 3432 + 1 = 3433
43
Filespace
HASHING CONCEPTS..
Hashing algorithm – the formula used to calculate a record address
Hash address – an address (within block) where a hashed record is stored
Buckets – storage area for a group of records; bucket size refers to # of slots.
Slots – storage area for an individual record
Collision – when two records hash to the same address
Load factor – is the ratio of # of records to the total space allocated
Average search length – is the time it takes to retrieve a record on the avg.
(usually expressed in terms of disk accesses)
Disk access – every time a disk is accessed for getting a record (if the
record is stored in its hardware address, one access otherwise it depends
on record location)
38
HASHING ALGORITHMChoose load factor
Identify # of buckets to be allocated
Select a prime# close to this number
Divide each pkey by prime#
Remainder = record address
Sequentially number the buckets
Place each record to its address
If there are overflows, use Open
39
HASHING CONCEPTS..
11
22
33
44
55
66
77
....
nn
Collision: When two keyshash to the same address
Open overflow(store in unallocated slots)
Chained overflow(a separate area)
OVERFLOWS
40
HASHING EXAMPLE
Given Part#s:
100 Gears
120 Scrapers
130 Aux motors
140 Crankshafts
145 Cylinder heads
150 Pistons
100 Mod 7 = 2
120 Mod 7 = 1
130 Mod 7 = 4
140 Mod 7 = 0
145 Mod 7 = 5
150 Mod 7 = 3
assume 8 buckets (0..7)
assume 1 slot per bucket
assume disk access time of 20 ms
41
HASHING EXAMPLE..
0
12
3
4
100 Gears
120 Scrapers
130 Aux. motor
5
140 Crankshaft
145 Cylinders
FILE LOADINGS
150 Pistons
6
Insert: 135 Shovel?
135 Mod 7 = 2
Average search length?
6 records -> 1 access
1 record -> 2 accesses7
Load factor: ?
Bucket size = ?
42
THE HASHING ORGANIZATION
H(pkey) --> record address
Records in hash sequence
Need to allocate extra space
Load factor between 60-80%
Good for low activity (FAR) files
Real-time and OO applns.
EVALUATION
43
DISCUSSION
A parts file with Part# as the pkey includes records with the
following part# values:
23,37,46,48, 56,18, 10, 71, 16, 24, 39, 47 and 69.
The file uses 8 buckets numbered 0 to 7. Each bucket holds
two records.
Load these records into the file in the given order using the
hash function h(K) = K mod 8. Calculate the average search
length in terms of # of disk accesses.
44
INDEXED ORGANIZATION
Primary key
Secondary key
Clustered
A method of file organization where a subset of key values are stored in an index. Types are:
45
Records are in pkey sequence (master file)
But are organized into groups
Grouping information is stored in
index file
Records can be inserted at random
Records can be accessed in sequence or at random
THE INDEXED ORGANIZATION(ISAM)
46
THE ISAM ORGANIZATION
87 189 300 Cylinder index
43 69 87 136 150
250 300
24 32 43
45 62 69
Track index
Overflow tracks
Sequence Set
122 136
141 150 172
CYLINDER1 CYLINDER N..
Index Set
74 77 87 175 181 189 278 281 300
… …. …
… …. …
… …. …
Note: Assume that the corresponding HW addresses are stored along with the pkeys49
INSERTIONS IN ISAM
Identify track where record needs
to be inserted
If the track is full, insert in overflow area
If the track has room insert pkey in sequence
Update track index and cylinder index if necessary
50
ISAM: ADVANTAGES AND DISADVANTAGES
Access is direct or sequential?
Access time dependent on?
Rewrite sequentially
Retrieval time uniform
Suitable for volatile files?
Workhorse organization used in
most apps.
51
SECONDARY KEY INDEX
REC# E_SSN E_NAME E_TITLE E_SALARY
1. 456-34-8895 Smith Programmer $35,000
2. 459-66-6785 Johnson Analyst $27,000
3. 467-89-8898 Weintraub Programmer $60,000
4. 478-64-8005 Dickson Manager $64,000
5. 489-12-5575 Holland Analyst $47,000
6. 492-93-4438 Rao Analyst $71,000
7. 537-89-8898 McDonald Manager $85,000
EMPLOYEEEMPLOYEE
E_TITLE REC#
Analyst 2,5,6
Manager 4,7
Programmer 1,3
52
CLUSTERED INDEX
REC# E_SSN E_NAME E_TITLE E_SALARY
1. 459-66-6785 Johnson Analyst $27,000
2. 489-12-5575 Holland Analyst $47,000
3. 492-93-4438 Rao Analyst $71,000
4. 478-64-8005 Dickson Manager $64,000
5. 467-89-8898 McDonald Manager $85,000
6. 467-89-8898 Weintraub Programmer $60,000
7. 456-34-8895 Smith Programmer $35,000
EMPLOYEE
E_TITLE REC#
Analyst 1
Manager 4
Programmer 6
Also known asInverted fileorganization
53
INDEXING STRATEGIES
Index if you must
Index on pkey
Index on foreign keys
Index on secondary key
(depending on query frequency)
54
DISCUSSIONWhat activities are part of identifying storage strategies?
How is denormalization carried out for M:N relationships?
How many indexes can you have per table?
How many clustered indexes?
Can we sequentially update all records in
a) hashing organization? b) in indexing?
Is indexing suitable for volatile files?
If an index consists of 3 levels of indexes with the
main index in RAM, and a disk access time of 20 MS,
how long on the average does it take to retrieve a record?
What problems do overflow records cause in hashing?
55