www.infotech.monash.edu.au/FIT1004/
FIT1004 DatabaseTopic 1: Introduction
Learning Objectives:• Data, Database, DBMS• Understand the motivation for the Database Approach• The Database System Environment• Objectives of Database Technology• DBMS Functions• DB Models• Relational Database Model
References:• Rob, P. & Coronel, C., Database Systems, 7th Edition, Chapter 1,
Chapter 2 - Sections 2.1, 2.3, 2.4
2
Where We Are
Introduction to Database Systems The Relational Model
Conceptual Design Logical Design Normalisation
Database Lifecycle Physical Design
SQL (DML) SQL (DDL & DCL) Implementation Transaction Management
Database Administration
Data Warehousing & Data Mining
3
Data vs. Information
• Data are raw facts. Facts concerning people, places, events, or other objects/concepts.
• Data by itself is useless unless it is some how aggregated, organised and prepared in a form convenient for decision making or other organisational activities.
• Data that is processed to reveal their meaning becomes information, eg. total sales per quarter
• A lack of data leads to inadequate information and thus ill-informed decisions and business failure.
• Data is a valuable corporate resource which needs adequate integrity and security controls.
• Data management is a discipline that focuses on the proper generation, storage and retrieval of data
4
Data vs. Information
5
Database and DBMS
• A database is a shared integrated computer structure that houses a collection of:
– end user data– Metadata, or data about data, through which the data is integrated and
managed• A database management system (DBMS) is
─ a collection of programs that manages the database structure and controls access to the data stored in the database─contains a query language
6
Traditional File Systems
Databases are often contrasted to traditional file systems though they are now rarely used. But as the problems that existed were the impetus for the development of the “Database Concept” it is worth noting some of these.
• Problems:– Requires extensive programming in third-generation
language (3GL)– Data and structural dependence– Data redundancy
7
• Requests for information (reports) required a DP specialist to write programs for the department that required the report
• File systems developed to address needs
• Data was organized in the files according to expected use - led to islands of information
Traditional File Systems
• To retrieve data required extensive programming in third-generation language (3GL), this was time consuming and made ad hoc queries impossible
• Often the same data was stored in many different locations, eg. agent details occurred in both the CUSTOMER and AGENT files
8
Traditional File Systems
• In the past as new applications were written they used existing files or created a new file for their use.
• Sometimes several existing files need to be sorted and merged toobtain the new file. Often several files contained the same information stored in different ways. In other words, there would be redundant and possibly inconsistent data.
• Example of an insurance company file
9
Traditional File Systems
• Data Dependence– Changes in file’s data characteristics requires modification of data
access programs– Must tell program what to do and how– Makes file systems cumbersome from a programming and data
management views• Structural Dependence
– Change in file structure requires modification of related programs• Data Redundancy
– Different and conflicting versions of same data– Results of uncontrolled data redundancy
> Data anomalies– Modification, Insertion, Deletion
10
Traditional File Systems
• Data Redundancy (cont)– Data inconsistency
> Lack of data integrity• File Terminology
– Field > group of characters with specific meaning
– Record > logically connected fields that describe a person, place, or thing
– File > collection of related records
11
• Applications were often considered in relative isolation.
• Data that should have been together was not.• The potential for flexible enquiry and reporting
was limited.• All validations were in the programs.• Procedures were required for backup and
recovery.• All programmers had access to all records.• There was limited concurrent access.
Traditional File Systems
12
Database Systems
• Consists of logically related data stored in a single repository
• Provides advantages over file system management approach
– Eliminates inconsistency, data anomalies, data dependency, and structural dependency problems
– Stores data structures, relationships, and access paths• The centralised control of data means that for many
applications the data already exists.• The data is no longer related by application programs, but
by the structure defined in the database.
13
Database vs. File Systems
14
The Database System Environment
15
Objectives of Database Technology
• Data Independence• Minimal Data Redundancy• Increased Data Sharing• Improved Data Quality • Improved Security of Data• Improved Access to Data• Reduced Program Maintenance• Inter-relate data thru the model
16
• Is the property of being able to change the logical or physical structure of data without requiring changes to application programs that manipulate that data
• Data is stored independently of the programs• The degree to which descriptions of data are
embedded in application programs.• Can the database structure be changed with no impact
on programs ?• Role of the database catalog or dictionary.• Maintenance costs are high !
Data Independence
17
GLOBALLOGICAL
DATABASEDESCRIPTION
Application ProgramLocal Views
PhysicalFiles
Logical DataIndependence
Physical DataIndependence
Logical vs Physical Data Independence
18
Minimal Data Redundancy
• Minimise the duplication of data– data stored in more than one location
CUSTOMERCNbr, CName, CAddress
INVOICEInvNbr, InvDate, CNbr, InvTotal, CAddress
CUSTOMERCNbr, CName, CAddress, Last_InvNbr, Last_InvDate
INVOICEInvNbr, InvDate, CNbr, InvTotal
CUSTOMERCNbr, CName, CAddress, CBalance
INVOICEInvNbr, InvDate, CNbr, InvTotal
19
• The DBMS should support multiple concurrent users of the same data and ensure that the data remains consistent at all times.
Part 2 QOH 10
Part 2 QOH 10
Part 2 QOH 5
Part 2 QOH 20Part 2 QOH 10QOH=QOH+10
QOH=QOH-5
TX 1 TX 2
Sharing of Data
20
Sharing of Data
Trans 1Part # QOHP1 20
Xlock(P1) Read P1 (20)QOH = QOH + 15Write P1 (35)Unlock
Part # QOHP1 35
Part # QOHP1 25
Trans 2
Attempt to LockWait for Trans 1
Read P1 (35)QOH = QOH - 10Write P1 (25)
To avoid concurrency problems, transactions must be made logically serial. One common technique used is record locking. That is, a transaction can lock a record, preventing update by another transaction, until the update has completed.
21
• A condition in which given data always yield the same result
• Validation or integrity rules should be defined and automatically invoked at run time by the DBMS regardless of the source of update i.e. application program, web page or query language.
• Significant variation exists among DBMSs in the level of support for data integrity.
• ANSI/ISO suggest that 100% of all enterprise rules should be held in the conceptual schema, and specifically none in application programs.
• An area of significant development during the 1990's.
Data Integrity
22
• Protecting data against accidental or intentional use by unauthorised users
• Each user requires identification with a user-id and password.
• Users can be limited in the data they can see and what actions they can perform on that data.
• The DBMS encrypts and decrypts data as it is stored and retrieved.
• Many DBMS now provide data value sensitive security.
• Views are often used to limit user’s access to data
Security
23
• Objects and data in the database can be created, modified and accessed by executing structured query language (SQL) statements
• SQL provides easy access to data
20 READ #1, CUSTNO, NAME, BAL30 IF END #1 GO TO 8040 IF BAL > 20050 PRINT CUSTNO, NAME, BAL60 ENDIF70 etc
SELECT CUSTNO, NAME, BAL FROM CUST WHERE BAL > 200;
Easy Access to Data
24
DatabaseDatabase
SQL> SELECT loc 2 FROM dept;
SQL> SELECT loc 2 FROM dept;
SQL statementis entered Statement is sent
to database
Data is displayed
LOCATION----------------------------NEW YORKCHICAGOBOSTON
Data Access Using SQL
25
DDL Data Definition Language• the language component of a DBMS that is
used to describe the logical, and sometimes physical, structure of a database
• is used to specify the conceptual and internal schemas for the database
DML Data Manipulation Language• a language component of a DBMS that is
used to access and modify the contents (data) of a database
Overview of SQL
26
• DDL– the SQL commands for data definition are CREATE, ALTER,
DROP
– CREATE TABLE> define table
– ALTER TABLE> add new columns or modify existing columns
– DROP TABLE> delete table
• DML– the SQL commands for data manipulation are SELECT, INSERT,
UPDATE, DELETE
– SELECT> retrieve data from table
– INSERT> add a single row or copy rows from other table(s)
Overview of SQL
27
• DML (cont)– UPDATE
> modify column values– DELETE
> delete rows of data• Data Control
– COMMIT> commit changes to the database
– ROLLBACK> rollback (undo) changes
• Data Security– GRANT
> grant access privileges to users– REVOKE
> remove access privileges
Overview of SQL
28
• Data dictionary management– stores the definitions of data and their relationships
(metadata) in a data dictionary; any changes made are automatically recorded in the data dictionary.
– creates a security system and enforces security within that system.
• Data storage management– creates and manages the complex structures
required for data storage.• Data transformation and presentation
– transforms entered data to conform to the data structures that are required to store the data
DBMS Functions
29
• Security management– creates a security system and enforces security
within that system.• Multi-user access control
– creates complex structures that allow multiple-user access to the data.
• Backup and recovery management– performs backup and data recovery procedures to
ensure data safety.
DBMS Functions
30
• Data integrity management– promotes and enforces integrity rules to eliminate
data integrity problems• Database language and application
programming interfaces – provides access to the data via utility programs and
from programming languages interfaces.• Database communication interfaces
– provides end-user access to data within a computer network environment.
DBMS Functions
31
• Collection of logical constructs used to represent data structure and relationships within the database
– Conceptual models: logical nature of data representation– Implementation models: emphasis on how the data are
represented in the database• Relationships in Conceptual Models
– One-to-one (1:1)– One-to-many (1:M)– Many-to-many (M:N)
• Implementation Database Models– Hierarchical – Network – Relational
Database Models
32
Database Models
• Hierachical– Logically represented by an upside down tree
> Each parent can have many children> Each child has only one parent
• Network– Each record can have multiple parents
> Composed of sets> Each set has owner record and member record> Member may have several owners
• Relational– Perceived by user as a collection of tables for data
storage– Tables are a series of row/column intersections– Tables related by sharing common entity characteristic(s)
33
Relational Database Model
• Advantages– Structural independence– Improved conceptual simplicity– Easier database design, implementation, management,
and use – Ad hoc query capability with SQL– Powerful database management system
• Disadvantages– Substantial hardware and system software overhead– Poor design and implementation is made easy– May promote “islands of information” problems
34
A relational A relational database is a collection of relations is a collection of relations or twoor two--dimensional tables.dimensional tables.
DatabaseDatabase
DEPTNO DNAME LOC
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
DEPTNO DNAME LOC
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
Table Name: : DEPTEMPNO ENAME JOB DEPTNO
7839 KING PRESIDENT 10
7698 BLAKE MANAGER 30
7782 CLARK MANAGER 10
7566 JONES MANAGER 20
EMPNO ENAME JOB DEPTNO
7839 KING PRESIDENT 10
7698 BLAKE MANAGER 30
7782 CLARK MANAGER 10
7566 JONES MANAGER 20
Table Name: : EMP
Definition of a Relational Database
35
Relational Database Management System
User tablesUser tables Data Data dictionarydictionary
ServerServer
36
Relational Database Terminology
1
2 3 4
5
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
------------- ------------ --------------------- -------- ---------------- ----------- -------------- -----------
7839 KING PRESIDENT 17-NOV-81 5000 10
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7900 JAMES CLERK 7698 03-DEC-81 950 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7902 FORD ANALYST 7566 03-DEC-81 3000 20
7369 SMITH CLERK 7902 17-DEC-80 800 20
7788 SCOTT ANALYST 7566 09-DEC-82 3000 20
7876 ADAMS CLERK 7788 12-JAN-83 1100 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10
6
37
Relational Database Terminology
Relation• a named collection of attributes
Tuple• the collection of values that compose one row of a relation
Attribute• a named characteristic or property of an entity
Properties of Relations•the tuples of a relation have no ordering (top to bottom)•the attibutes of a relation have no ordering (left to right)•the entries in the table (attributes) are single valued
The degree of a relation is the number of attributes in that relationThe cardinality is the number of tuples in the relation
38
Relationships between entities are supported by attributes which are common to both entities
Primary Key•an attribute or attributes that uniquely identify a record instanceor tuple in a relation
Foreign Key•an attribute or combination of attributes of one relation R2 whose values are required to match those of the PK of relation R1, whereR1 and R2 are not necessarily distinct
•a FK and the corresponding PK should be defined on the samedomain
Relational Database Terminology
39
• Each row of data in a table is uniquely identified by a primary key (PK).
Table Name: Table Name: EMP Table Name: DEPT
Primary key Primary key
• You can logically relate data from multiple tables using foreign keys (FK).
Foreign key
EMPNO ENAME JOB DEPTNO
7839 KING PRESIDENT 10
7698 BLAKE MANAGER 30
7782 CLARK MANAGER 10
7566 JONES MANAGER 20
DEPTNO DNAME LOC
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
Relating Multiple Tables
40
Summary
• Information is derived from data, which are usually stored in a database
• A DBMS is software that implements and manages a database
• Databases were developed to address the weaknesses of file systems
• A DBMS – presents to the user a single data repository that
promotes data sharing – Enforces data integrity, eliminates redundancy and
promotes data security