Chapter 1 of Database Design, Application Development and
AdministrationCopyright © 2007 by The McGraw-Hill Companies, Inc.
All rights reserved.
Chapter 1
Textbook: Database Design, Application Development and
Administration
Chapter 1 objectives:
Describe the characteristics of business databases and the features
of database management systems
Appreciate the advances in database technology and the contribution
of database technology to modern society
Understand the impact of database management system architectures
on distributed processing and software maintenance
Perceive career opportunities related to database application
development and database administration
1-*
Welcome!
Database technology: crucial to the operation and management of
modern organizations
Major transformation in computing skills
Significant time commitment
Exciting journey ahead
Welcome to Chapter 1 on introduction to database management
Database management is crucial to the operation and management of
modern organizations:
- infrastructure (plumbing) for daily business operations
- raw materials for long range decision making
Transformation: as significant as learning computer programming and
algebra
Time: assignments and projects; lots of practical skills; detailed
textbook
Database field:
- Challenging work (sometimes too challenging);
- Very dynamic field: much new R & D
1-*
Practical textbook
Detailed material
- Designed for students without an previous course in database
management
- Beneficial even for those with significant database
experience
First part of text: Overview of database management and overview of
database development
Second part: fundamentals of relational databases; Relational data
model, SQL, basic query formulation skills;
Third part of text: data modeling and conversion; developing a
database; skill used by database specialist or functional user
developing a database
Fourth part: relational database design involving normalization and
physical database design
Fifth part: application development emphasis; advanced query
formulation skills; data requirements for forms/reports; triggers
and stored procedures;
Sixth part: advanced database development with view integration
(linking database design and application development);
comprehensive case
Seventh part of textbook: background on database administration and
specialized processing area (transaction management, data
warehouses, distributed processing, object data management)
Detailed material: developing skills requires lots of
practice
Not a theoretical textbook (does not prove theorems nor present any
axioms)
1-*
Outline
Essential characteristics of databases
Organizational roles: how you might be using a DBMS
1-*
Information: transformed data that has value for decision
making
Essential to organize data for retrieval and maintenance
Most organizations have a flood of data (too much data is the
problem); web proliferation has greatly multiplied the amount of
data
Conventional facts: names, DOBs, salaries, interest rates, codes
(major)
Unconventional facts: images, engineering drawings, maps, product
videos, fingerprints, time series (useful for forecasting), web
page
Distinction sometimes made between data and information: raw facts
need interpretation, combination, formatting, etc. to be useful for
decision making
1-*
Databases are ubiquitous; many encounters this week
Persistent:
- Lasts longer than the execution of a computer program
- Program variables are not stored in a database
- Relevance of intended usage: only store potentially relevant
data
Inter-related:
- Entity: cluster of data about a topic (customer, student,
loan)
- Relationship: connection among entities
Shared:
- Multiple uses: hundreds to thousands of data entry screens and
reports
- Multiple users: many people simultaneously use a database
1-*
University Database
To depict these characteristics, let us consider a number of
databases. We begin with a simple university database (Figure 1)
since you have some familiarity with the workings of a university.
A simplified university database contains data about students,
faculty, courses, course offerings, and enrollments. The database
supports procedures such as registering for classes, assigning
faculty to course offerings, recording grades, and scheduling
course offerings. Relationships in the university database support
answers to questions such as
· What offerings
are available for a course in a given academic period?
· Who is the
instructor for an offering of a course?
· What students are
enrolled in an offering of a course?
1-*
1-*
Enterprise DBMSs
Desktop DBMSs
Embedded DBMSs
DBMS (Database Management System): collection of components (mostly
software)
Enterprise DBMS: supports mission critical information systems;
very large dbs, many users, tight performance requirements
Desktop DBMS: end user departments and small databases
Embedded DBMS: resides in a larger system, either an application or
a device such as a Personal Digital Assistant or smart card.
Embedded DBMSs provide limited transaction processing features but
have low memory, processing, and storage requirements.
Features common to most DBMSs: database definition, non procedural
access, application development, procedural language interface,
transaction processing
1-*
Tables and relationships
Fundamental difference to other productivity software: amount of
planning before using; defined database before using
Table: 2 dimensional arrangement of data; relationship: linking
column among tables
SQL: industry standard database language
1-*
Access relationship window
5 tables (student, enrollment, course, offering, faculty):
faculty_1 is not a real table (details later)
Relationships: lines connecting tables (faculty to offering); not
all tables are directly connected
Must define the tables and relationships before entering data and
retrieving data
Tabless
Relationships
1-*
University Database diagram drawn with an external tool (Visio
Professional);
Learn Entity Relationship Diagrams in second part of course
- Entity: similar to a table
- Relationship: connection among entities with names and connection
symbols
Can use third party tools for database definition
OfferNo OffLocation OffTime
Query: request for data to answer a question
Indicate what parts of database to retrieve not the procedural
details
Improve productivity and improve accessibility
SQL SELECT statement and graphical tools
Specify what not how
Loop buster: no loops; major difference between procedural and
nonprocedural language
Trip planning analogy: specify features of trip (destination,
quality of accommodations, dates, …) but not details (route, hotel
research, flight research, …)
Productivity improvement: 100 times fewer lines of code
1-*
Query Design (Access)
1-*
Report: formatted document for display
Use nonprocedural access to specify data requirements of forms and
reports
Nonprocedural access by itself is not useful because of default
output appearance
Nonprocedural access combined with graphical tools for form and
report development is very powerful
Non-procedural access makes form and report creation possible
without extensive coding. As part of creating a form or report, the
user indicates the data requirements using a non-procedural
language (SQL) or graphical tool. To complete a form or report
definition, the user indicates formatting of data, user
interaction, and other details.
1-*
Faculty assignment form
The form can be used to add new course assignments for a professor
and to change existing assignments.
1-*
Sample Report
The report uses indentation to show courses taught by faculty in
various departments. The indentation style can be easier to view
than the tabular style shown as default output style.
1-*
Why
Combine external languages (COBOL, Java, C, C++, …) with SQL
New DBMS specific languages: PL/SQL (Oracle), Transact-SQL (SQL
Server)
Batch processing: much business processing is batch (collect loan
applications and process together); online processing is becoming
more prevalent because of the web;
Customization: customize the behavior of a data entry form
Automation: rule processing; check qoh when an order is
placed
Performance: more control with a procedural language
1-*
Control simultaneous users
Recover from failures
Major difference between enterprise and desktop DBMSs: transaction
processing ability; major cost difference
1-*
Database Technology Evolution
The first generation supported sequential and random searching, but
the user was required to write a computer program to obtain
access.
The second generation products were the first true DBMSs as they
could manage multiple entity types and relationships. However, to
obtain access to data, a computer program still had to be written.
Second generation systems are referred to as “navigational” because
the programmer had to write code to navigate among a network of
linked records.
Third generation systems are known as relational DBMSs because of
the foundation based on mathematical relations and associated
operators. Optimization technology was developed so that access
using non-procedural languages would be efficient.
Fourth generation systems can store and manipulate unconventional
data types such as images, videos, maps, sounds, and animations.
Because these systems view any kind of data as an object to manage,
fourth generation systems are sometimes called “object-oriented” or
“object-relational”. In addition to the emphasis on objects, the
Internet is pushing DBMSs to develop new forms of distributed
processing.
Era
Generation
Orientation
File XE "File" structures XE "File structures" and proprietary
program interfaces
1970s
1980s
1990s
1-*
SQL Server: strong in Windows
DB2: strong in mainframe environment
Significant open source DBMSs: MySQL, Firebird, PostgreSQL
Desktop DBMS
Access: dominates
FoxPro, Paradox, Approach, FileMaker Pro
According to the International Data Corporation (IDC), sales
(license and maintenance) of enterprise database software reached
$13.6 billion in 2003, a 7.6 % increase since 2002. Enterprise
DBMSs use mainframe servers running IBM’s MVS operating system and
mid-range servers running Unix (Linux, Solaris, AIX, and other
variations) and Microsoft Windows Server operating systems. Sales
of enterprise database software have followed economic conditions
with large increases during the Internet boom years followed by
slow growth during the dot-com and telecom slowdowns. For future
sales, IDC projects sales of enterprise DBMSs to reach $20 billion
by 2008.
According to IDC, three products dominate the market for enterprise
database software as shown in Table 1-3. The IDC rankings include
both license and maintenance revenues. When considering only
license costs, the Gartner Group ranks IBM with the largest market
share at 35.7%, followed by Oracle at 33.4%, and Microsoft at
17.7%. The overall market is very competitive with the major
companies and smaller companies introducing many new features with
each release.
Open source DBMS products have begun to challenge the commercial
DBMS products at the low end of the enterprise DBMS market.
Although source code for open source DBMS products is available
without charge, most organizations purchase support contracts so
the open source products are not free. Still, many organizations
have reported cost savings using open source DBMS products, mostly
for non-mission-critical systems. MySQL, first introduced in 1995,
is the leader in the open source DBMS market. PostgreSQL and open
source Ingres are mature open source DBMS products. Firebird is new
open source product that is gaining usage.
1-*
Data Independence
Software maintenance is a large part (50%) of information system
budgets
Reduce impact of changes by separating database description from
applications
Change database definition with minimal effect on applications that
use the database
Data Independence: a database should have an identity separate from
the applications (computer programs, forms, and reports) that use
it. The separate identity allows the database definition to be
changed without affecting related applications.
The close association between a database and related programs led
to problems in software maintenance. Software maintenance
encompassing requirement changes, corrections, and enhancements can
consume a large fraction of computer budgets. In early DBMSs, most
changes to the database definition caused changes to computer
programs. In many cases, changes to computer programs involved
detailed inspection of the code, a labor-intensive process. This
code inspection work is similar to year 2000 compliance where date
formats must be changed to four digits. Performance tuning of a
database was difficult because sometimes hundreds of computer
programs had to be recompiled for every change. Because database
definition changes are common, a large fraction of software
maintenance resources were devoted to database changes. Some
studies have estimated the percentage as high as 50% of software
maintenance resources.
1-*
Schema levels:
- Internal level: implementation details for base tables (indexes,
disk extents, clustering)
- Chapter 10 for physical database design
Mappings:
- Performed by the DBMS: relieve user of much work
- External to Conceptual: submit query using a view; DBMS
translates to base tables
- Conceptual to Internal: SELECT statement implemented with loops,
join order, index
usage, …
- Use views rather than base tables in applications
- DBMS translates queries on a view to query on lower level
schema
1-*
External
FacultyAssignmentFormView: data required for the form in Slide 16
(Figure 1.9)
FacultyWorkLoadReportView: data required for the report in Slide 17
(Figure 1.10)
Conceptual: tables in Slide 11
Internal
Extra files to improve performance
To make the three schema levels clearer, Table 4 shows differences
among database definition at the three schema levels using examples
from the features described in Section 1.2. Even in a simplified
university database, the differences among the schema levels is
clear. With a more complex database, the differences would be even
more pronounced with many more views, a much larger conceptual
schema, and a more complex internal schema.
The schema mappings describe how a schema at a higher level is
derived from a schema at a lower level. For example, the external
views in Table 3 are derived from the tables in the conceptual
schema. The mapping provides the knowledge to convert a request
using an external view (for example, HighGPAView) into a request
using the tables in the conceptual schema. The mapping between
conceptual and internal levels shows how entities are stored in
files.
1-*
Client-Server Architecture: an arrangement of components (clients
and servers) and data among computers connected by a network. The
client-server architecture supports efficient processing of
messages (requests for service) between clients and servers.
To improve performance and availability of data, the client-server
architecture supports many ways to distribute software and data in
a computer network. The simplest scheme is just to place both
software and data on the same computer (Figure 13(a)). To take
advantage of a network, both software and data can be distributed.
In Figure 13(b), the server software and database are located on a
remote computer. In Figure 13(c), the server software and database
are located on multiple remote computers.
1-*
Organizational Roles
Because databases are pervasive, there are a variety of ways in
which you may interact with databases. The classification in Figure
14 distinguishes between functional users who interact with
databases as part of their work and information systems
professionals who participate in designing and implementing
databases. Each box in the hierarchy represents a role that you may
play. You may simultaneously play more than one role. For example,
a functional user in a job such as financial analysis may play all
three roles in different databases. In some organizations, the
distinction between functional users and information systems
professionals is blurred. In these organizations, functional users
may participate in designing and using databases.
Functional users can play a passive or an active role when
interacting with databases. Indirect usage of a database is a
passive role. An indirect user is given a report or some data
extracted from a database. A parametric user is more active than an
indirect user. A parametric user requests existing forms or reports
using parameters, input values that change from usage to usage. For
example, a parameter may indicate a date range, sales territory, or
department name. The power user is the most active. Because
decision making needs can be difficult to predict, ad hoc or
unplanned usage of a database is important. A power user is skilled
enough to build a form or report when needed. Power users should
have a good understanding of non-procedural access, a skill
described in the first part of this book.
14.bin
1-*
Data administrator
Both positions require more than 1 db course
- 2nd course
1-*
Summary
Nonprocedural access is a crucial feature
Many opportunities to work with databases
Working with databases: can be lucrative but very demanding
First part of textbook: fundamentals of relational databases
Other chapters in Part 1:
Chapter 2: overview of the database development process
Billing
Meter
Reading
Payment
Processing
on different computers
Client
Server
Client
Server
Client
Server
Server
Database
Database
Client
Client
Client
Client
Client
1960s 1
proprietary program