+ All Categories
Home > Documents > 1 Databases and Database Design William A. Yasnoff, MD, PhD Oregon Health Division.

1 Databases and Database Design William A. Yasnoff, MD, PhD Oregon Health Division.

Date post: 20-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
35
1 Databases and Database Design William A. Yasnoff, MD, PhD Oregon Health Division
Transcript

1

Databases and Database Design

William A. Yasnoff, MD, PhD

Oregon Health Division

2

Databases & Database Design

Introduction Relational Databases Database Design: Data Models

3

Introduction

Database = large collection of information Stored on hard disk Retrieval– 100,000 items X 10 ms = 17 minutes– Need to organize data to improve retrieval

speed– How do you organize your paper files?

4

Introduction (continued)

Data Organization– sorting data: phonebook in alpha v. random

order

– “tabs” at key points: dictionary with tabs for each letter

– direct pointers to information = index» table of key element value and address of other data

» e.g. name & mailbox number --> letters in mailbox

5

Relational Database Model

6

Relational Database Model

Developed by EF Codd, CJ Date (70s) Table = Entity = Relation Table row = tuple = instance Table column = attribute Table linkage by values Entity-Relationship Model

7

Attributes

Key– uniquely identify row/tuple– may be one or more attributes

Non-key– other properties of instance– dependent on key

8

Retrieval in RDMS

SQL– Select ...– From ...» optional: can be computed

– Where ... Query by example– Fill in the blanks– WONDER

9

Advantages of RDMS

Very strong theoretical basis– storage– retrieval

Easy to implement Conflicts and anomalies can be avoided Intuitive appeal Easy retrieval

10

Relational Model: Summary

popular method of organizing data strong theoretical properties facilitate

retrieval relation = table attribute = column tuple = row key = {attributes} that uniquely identify

each row

11

Relational Model: Summary 2

Allows complex data relationships with multiple tables:– {patient id, patient demographics}– {patient id, patient visit date, blood

pressure} Structured Query Language (SQL)

retrieval» Select patient _name where blood_lead_level > 10

12

Database Design: Data Models

13

Data Models

Definition: A representation of the data and and data relationships of an activity– data: case report of serious E. Coli illness– data: E. Coli serotype O157– relationship: case report “contains” serotype

(semantic relationship) Database Design = Development of a

quality data model

14

Data Model Quality

Correctness– Conceptual (concepts

represented properly)» real world

representation

– Syntactic (relationships represented properly)» real world language

Completeness (wholeness)– Conceptual (all

concepts represented)

– Syntactic (all relationships represented)

15

Entity

“Any distinguishable object that is to be represented in a database” [C.J. Date]

Properties– within scope of model– single concept– a set of distinguishable “instances”– satisfies normalization rules

Also called a relation

16

Entity Guidelines

Relationship with at least one other entity

Unique, descriptive name Class or set of things (not just one) Single meaning (no homonyms) No synonyms (two entities describing

same class)

17

Attribute

“a type or characteristic of an entity” (e.g. “gender” is an attribute of the entity “patient”)

Domain = the set of values from which an attribute may be selected (e.g. the domain of the attribute “gender” is [male, female])

An entity typically has many attributes

18

Attribute Characteristics

Key– value uniquely identifies entity» e.g. “Lab test ID” is key attribute of “specimen”

Non-Key– value does not uniquely identify entity» e.g. “author” does not uniquely identify

“publication”

19

Attribute Characteristics

Atomic = individual data value (one and only one fact)

Description = complete and clear definition– e.g. “professional privilege date” = The date on

which a health care professional is granted privilege to practice in a particular health care facility, establishing the provider’s eligibility for patient care assignments and liability coverage

20

Attribute Guidelines

Unique name– no plurals, possessives, articles, conjunctions,

verbs, or prepositions Clear, complete, unambiguous description Atomic (no positional information) Domain with 2 or more values Originates in only one entity

21

Primary Key

Attribute of an entity whose values uniquely determine its occurrences– {birth certificate number, person name, birth

date, mother name, physician name}– {facility, patient name, physician name, date,

temperature, pulse, blood pressure}

22

Primary Key Characteristics

Stable: does not change over time Minimal: fewest attributes necessary Factless: no hidden information Definitive: value always exists Accessible: available when data created Unique: absolutely no duplicates

23

Relationship

Semantic: “contains”, “is part of”, “belongs to”

One-to-one– serotype of an organism– immunization status of a child

One-to-many– antibiotic resistances of an organism– vaccines administered to a child

24

Relationship Guidelines

No circular references (e.g. “health plans -> markets -> products -> health plans” should be “health plans -> health plan markets -> market products <- health plan products <- health plans”)

Single relationship between two entities No recursive relationships

25

Single Relationship of Entities

Problem: employee <--> job assignment <--> job [double circular]

Solution: person --> job assignment <-- job [“person” includes “status” as employee or contractor]

26

Recursive Relationships

Problem: supervisor <--> employee [may be circular]

Solution: person {person ID } --> employee-supervisor relationship {employee ID, supervisor ID}– multiple supervisory roles for each person

27

Normalization

Formalization of common sense rules of information organization

An attribute is functionally dependent on X only if each of its values is determined by the value of X (X may be composite)

Example: DOB is functionally dependent on Driver’s License number

28

Key Normalization Concept

Functional dependence of each entity must be– based on entire primary key– NOT based on any other attributes

29

Benefits of Normalization

Aids in database design, integration Ensures precise capture of business logic Minimizes redundancy Minimizes need for null values Prevents– information loss– unintentional results

30

Summary of Normalization

One Fact in One Place

31

Common DB Design Errors

Multiple instances in same row of table, e.g. first_value, second_value, third_value [Problem: what to do with 4th value?]

Same data item repeated in multiple places, e.g. address appears in two different tables [Problem: how to keep two values synchronized?]

32

Data Model Quality

Correctness– Conceptual (concepts

represented properly)» real world

representation

– Syntactic (relationships represented properly)» real world language

Completeness (wholeness)– Conceptual (all

concepts represented)

– Syntactic (all relationships represented)

33

Database Design Pearls

Accommodate all data needed Correct relationships between data items No duplicate representation Anticipated retrievals use indexes Meet confidentiality requirements

34

References - 1

Reingruber MC & Gregory WW: The Data Modeling Handbook (New York: John Wiley & Sons, 1994)

Montgomery SL: Object-Oriented Information Engineering (Boston: AP Professional/Harcourt Brace, 1994)

35

References - 2

Codd EF: The Relational Model for Database Management (Reading, MA: Addison-Wesley, 1990)

Date CJ: An Introduction to Database Systems, 5th ed. (Reading, MA: Addison-Wesley, 1990)

Duncan KA: Health Information and Health Reform (San Francisco: Jossey-Bass, 1994)


Recommended