Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
3
Introduction
Database = large collection of information Stored on hard disk Retrieval– 100,000 items X 10 ms = 17 minutes– Need to organize data to improve retrieval
speed– How do you organize your paper files?
4
Introduction (continued)
Data Organization– sorting data: phonebook in alpha v. random
order
– “tabs” at key points: dictionary with tabs for each letter
– direct pointers to information = index» table of key element value and address of other data
» e.g. name & mailbox number --> letters in mailbox
6
Relational Database Model
Developed by EF Codd, CJ Date (70s) Table = Entity = Relation Table row = tuple = instance Table column = attribute Table linkage by values Entity-Relationship Model
7
Attributes
Key– uniquely identify row/tuple– may be one or more attributes
Non-key– other properties of instance– dependent on key
8
Retrieval in RDMS
SQL– Select ...– From ...» optional: can be computed
– Where ... Query by example– Fill in the blanks– WONDER
9
Advantages of RDMS
Very strong theoretical basis– storage– retrieval
Easy to implement Conflicts and anomalies can be avoided Intuitive appeal Easy retrieval
10
Relational Model: Summary
popular method of organizing data strong theoretical properties facilitate
retrieval relation = table attribute = column tuple = row key = {attributes} that uniquely identify
each row
11
Relational Model: Summary 2
Allows complex data relationships with multiple tables:– {patient id, patient demographics}– {patient id, patient visit date, blood
pressure} Structured Query Language (SQL)
retrieval» Select patient _name where blood_lead_level > 10
13
Data Models
Definition: A representation of the data and and data relationships of an activity– data: case report of serious E. Coli illness– data: E. Coli serotype O157– relationship: case report “contains” serotype
(semantic relationship) Database Design = Development of a
quality data model
14
Data Model Quality
Correctness– Conceptual (concepts
represented properly)» real world
representation
– Syntactic (relationships represented properly)» real world language
Completeness (wholeness)– Conceptual (all
concepts represented)
– Syntactic (all relationships represented)
15
Entity
“Any distinguishable object that is to be represented in a database” [C.J. Date]
Properties– within scope of model– single concept– a set of distinguishable “instances”– satisfies normalization rules
Also called a relation
16
Entity Guidelines
Relationship with at least one other entity
Unique, descriptive name Class or set of things (not just one) Single meaning (no homonyms) No synonyms (two entities describing
same class)
17
Attribute
“a type or characteristic of an entity” (e.g. “gender” is an attribute of the entity “patient”)
Domain = the set of values from which an attribute may be selected (e.g. the domain of the attribute “gender” is [male, female])
An entity typically has many attributes
18
Attribute Characteristics
Key– value uniquely identifies entity» e.g. “Lab test ID” is key attribute of “specimen”
Non-Key– value does not uniquely identify entity» e.g. “author” does not uniquely identify
“publication”
19
Attribute Characteristics
Atomic = individual data value (one and only one fact)
Description = complete and clear definition– e.g. “professional privilege date” = The date on
which a health care professional is granted privilege to practice in a particular health care facility, establishing the provider’s eligibility for patient care assignments and liability coverage
20
Attribute Guidelines
Unique name– no plurals, possessives, articles, conjunctions,
verbs, or prepositions Clear, complete, unambiguous description Atomic (no positional information) Domain with 2 or more values Originates in only one entity
21
Primary Key
Attribute of an entity whose values uniquely determine its occurrences– {birth certificate number, person name, birth
date, mother name, physician name}– {facility, patient name, physician name, date,
temperature, pulse, blood pressure}
22
Primary Key Characteristics
Stable: does not change over time Minimal: fewest attributes necessary Factless: no hidden information Definitive: value always exists Accessible: available when data created Unique: absolutely no duplicates
23
Relationship
Semantic: “contains”, “is part of”, “belongs to”
One-to-one– serotype of an organism– immunization status of a child
One-to-many– antibiotic resistances of an organism– vaccines administered to a child
24
Relationship Guidelines
No circular references (e.g. “health plans -> markets -> products -> health plans” should be “health plans -> health plan markets -> market products <- health plan products <- health plans”)
Single relationship between two entities No recursive relationships
25
Single Relationship of Entities
Problem: employee <--> job assignment <--> job [double circular]
Solution: person --> job assignment <-- job [“person” includes “status” as employee or contractor]
26
Recursive Relationships
Problem: supervisor <--> employee [may be circular]
Solution: person {person ID } --> employee-supervisor relationship {employee ID, supervisor ID}– multiple supervisory roles for each person
27
Normalization
Formalization of common sense rules of information organization
An attribute is functionally dependent on X only if each of its values is determined by the value of X (X may be composite)
Example: DOB is functionally dependent on Driver’s License number
28
Key Normalization Concept
Functional dependence of each entity must be– based on entire primary key– NOT based on any other attributes
29
Benefits of Normalization
Aids in database design, integration Ensures precise capture of business logic Minimizes redundancy Minimizes need for null values Prevents– information loss– unintentional results
31
Common DB Design Errors
Multiple instances in same row of table, e.g. first_value, second_value, third_value [Problem: what to do with 4th value?]
Same data item repeated in multiple places, e.g. address appears in two different tables [Problem: how to keep two values synchronized?]
32
Data Model Quality
Correctness– Conceptual (concepts
represented properly)» real world
representation
– Syntactic (relationships represented properly)» real world language
Completeness (wholeness)– Conceptual (all
concepts represented)
– Syntactic (all relationships represented)
33
Database Design Pearls
Accommodate all data needed Correct relationships between data items No duplicate representation Anticipated retrievals use indexes Meet confidentiality requirements
34
References - 1
Reingruber MC & Gregory WW: The Data Modeling Handbook (New York: John Wiley & Sons, 1994)
Montgomery SL: Object-Oriented Information Engineering (Boston: AP Professional/Harcourt Brace, 1994)