Download - Chapter01 Rev

Chapter 1 of Database Design, Application Development and AdministrationCopyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 1
Textbook: Database Design, Application Development and Administration
Chapter 1 objectives:
Describe the characteristics of business databases and the features of database management systems
Appreciate the advances in database technology and the contribution of database technology to modern society
Understand the impact of database management system architectures on distributed processing and software maintenance
Perceive career opportunities related to database application development and database administration
1-*
Welcome!
Database technology: crucial to the operation and management of modern organizations
Major transformation in computing skills
Significant time commitment
Exciting journey ahead
Welcome to Chapter 1 on introduction to database management
Database management is crucial to the operation and management of modern organizations:
- infrastructure (plumbing) for daily business operations
- raw materials for long range decision making
Transformation: as significant as learning computer programming and algebra
Time: assignments and projects; lots of practical skills; detailed textbook
Database field:
- Challenging work (sometimes too challenging);
- Very dynamic field: much new R & D
1-*
Practical textbook
Detailed material
- Designed for students without an previous course in database management
- Beneficial even for those with significant database experience
First part of text: Overview of database management and overview of database development
Second part: fundamentals of relational databases; Relational data model, SQL, basic query formulation skills;
Third part of text: data modeling and conversion; developing a database; skill used by database specialist or functional user developing a database
Fourth part: relational database design involving normalization and physical database design
Fifth part: application development emphasis; advanced query formulation skills; data requirements for forms/reports; triggers and stored procedures;
Sixth part: advanced database development with view integration (linking database design and application development); comprehensive case
Seventh part of textbook: background on database administration and specialized processing area (transaction management, data warehouses, distributed processing, object data management)
Detailed material: developing skills requires lots of practice
Not a theoretical textbook (does not prove theorems nor present any axioms)
1-*
Outline
Essential characteristics of databases
Organizational roles: how you might be using a DBMS
1-*
Information: transformed data that has value for decision making
Essential to organize data for retrieval and maintenance
Most organizations have a flood of data (too much data is the problem); web proliferation has greatly multiplied the amount of data
Conventional facts: names, DOBs, salaries, interest rates, codes (major)
Unconventional facts: images, engineering drawings, maps, product videos, fingerprints, time series (useful for forecasting), web page
Distinction sometimes made between data and information: raw facts need interpretation, combination, formatting, etc. to be useful for decision making
1-*
Databases are ubiquitous; many encounters this week
Persistent:
- Lasts longer than the execution of a computer program
- Program variables are not stored in a database
- Relevance of intended usage: only store potentially relevant data
Inter-related:
- Entity: cluster of data about a topic (customer, student, loan)
- Relationship: connection among entities
Shared:
- Multiple uses: hundreds to thousands of data entry screens and reports
- Multiple users: many people simultaneously use a database
1-*
University Database
To depict these characteristics, let us consider a number of databases. We begin with a simple university database (Figure 1) since you have some familiarity with the workings of a university. A simplified university database contains data about students, faculty, courses, course offerings, and enrollments. The database supports procedures such as registering for classes, assigning faculty to course offerings, recording grades, and scheduling course offerings. Relationships in the university database support answers to questions such as
· What offerings are available for a course in a given academic period?
· Who is the instructor for an offering of a course?
· What students are enrolled in an offering of a course?
1-*
1-*
Enterprise DBMSs
Desktop DBMSs
Embedded DBMSs
DBMS (Database Management System): collection of components (mostly software)
Enterprise DBMS: supports mission critical information systems; very large dbs, many users, tight performance requirements
Desktop DBMS: end user departments and small databases
Embedded DBMS: resides in a larger system, either an application or a device such as a Personal Digital Assistant or smart card. Embedded DBMSs provide limited transaction processing features but have low memory, processing, and storage requirements.
Features common to most DBMSs: database definition, non procedural access, application development, procedural language interface, transaction processing
1-*
Tables and relationships
Fundamental difference to other productivity software: amount of planning before using; defined database before using
Table: 2 dimensional arrangement of data; relationship: linking column among tables
SQL: industry standard database language
1-*
Access relationship window
5 tables (student, enrollment, course, offering, faculty): faculty_1 is not a real table (details later)
Relationships: lines connecting tables (faculty to offering); not all tables are directly connected
Must define the tables and relationships before entering data and retrieving data
Tabless
Relationships
1-*
University Database diagram drawn with an external tool (Visio Professional);
Learn Entity Relationship Diagrams in second part of course
- Entity: similar to a table
- Relationship: connection among entities with names and connection symbols
Can use third party tools for database definition
OfferNo OffLocation OffTime
Query: request for data to answer a question
Indicate what parts of database to retrieve not the procedural details
Improve productivity and improve accessibility
SQL SELECT statement and graphical tools
Specify what not how
Loop buster: no loops; major difference between procedural and nonprocedural language
Trip planning analogy: specify features of trip (destination, quality of accommodations, dates, …) but not details (route, hotel research, flight research, …)
Productivity improvement: 100 times fewer lines of code
1-*
Query Design (Access)
1-*
Report: formatted document for display
Use nonprocedural access to specify data requirements of forms and reports
Nonprocedural access by itself is not useful because of default output appearance
Nonprocedural access combined with graphical tools for form and report development is very powerful
Non-procedural access makes form and report creation possible without extensive coding. As part of creating a form or report, the user indicates the data requirements using a non-procedural language (SQL) or graphical tool. To complete a form or report definition, the user indicates formatting of data, user interaction, and other details.
1-*
Faculty assignment form
The form can be used to add new course assignments for a professor and to change existing assignments.
1-*
Sample Report
The report uses indentation to show courses taught by faculty in various departments. The indentation style can be easier to view than the tabular style shown as default output style.
1-*
Why
Combine external languages (COBOL, Java, C, C++, …) with SQL
New DBMS specific languages: PL/SQL (Oracle), Transact-SQL (SQL Server)
Batch processing: much business processing is batch (collect loan applications and process together); online processing is becoming more prevalent because of the web;
Customization: customize the behavior of a data entry form
Automation: rule processing; check qoh when an order is placed
Performance: more control with a procedural language
1-*
Control simultaneous users
Recover from failures
Major difference between enterprise and desktop DBMSs: transaction processing ability; major cost difference
1-*
Database Technology Evolution
The first generation supported sequential and random searching, but the user was required to write a computer program to obtain access.
The second generation products were the first true DBMSs as they could manage multiple entity types and relationships. However, to obtain access to data, a computer program still had to be written. Second generation systems are referred to as “navigational” because the programmer had to write code to navigate among a network of linked records.
Third generation systems are known as relational DBMSs because of the foundation based on mathematical relations and associated operators. Optimization technology was developed so that access using non-procedural languages would be efficient.
Fourth generation systems can store and manipulate unconventional data types such as images, videos, maps, sounds, and animations. Because these systems view any kind of data as an object to manage, fourth generation systems are sometimes called “object-oriented” or “object-relational”. In addition to the emphasis on objects, the Internet is pushing DBMSs to develop new forms of distributed processing.
Era
Generation
Orientation
File XE "File" structures XE "File structures" and proprietary program interfaces
1970s
1980s
1990s
1-*
SQL Server: strong in Windows
DB2: strong in mainframe environment
Significant open source DBMSs: MySQL, Firebird, PostgreSQL
Desktop DBMS
Access: dominates
FoxPro, Paradox, Approach, FileMaker Pro
According to the International Data Corporation (IDC), sales (license and maintenance) of enterprise database software reached $13.6 billion in 2003, a 7.6 % increase since 2002. Enterprise DBMSs use mainframe servers running IBM’s MVS operating system and mid-range servers running Unix (Linux, Solaris, AIX, and other variations) and Microsoft Windows Server operating systems. Sales of enterprise database software have followed economic conditions with large increases during the Internet boom years followed by slow growth during the dot-com and telecom slowdowns. For future sales, IDC projects sales of enterprise DBMSs to reach $20 billion by 2008.
According to IDC, three products dominate the market for enterprise database software as shown in Table 1-3. The IDC rankings include both license and maintenance revenues. When considering only license costs, the Gartner Group ranks IBM with the largest market share at 35.7%, followed by Oracle at 33.4%, and Microsoft at 17.7%. The overall market is very competitive with the major companies and smaller companies introducing many new features with each release.
Open source DBMS products have begun to challenge the commercial DBMS products at the low end of the enterprise DBMS market. Although source code for open source DBMS products is available without charge, most organizations purchase support contracts so the open source products are not free. Still, many organizations have reported cost savings using open source DBMS products, mostly for non-mission-critical systems. MySQL, first introduced in 1995, is the leader in the open source DBMS market. PostgreSQL and open source Ingres are mature open source DBMS products. Firebird is new open source product that is gaining usage.
1-*
Data Independence
Software maintenance is a large part (50%) of information system budgets
Reduce impact of changes by separating database description from applications
Change database definition with minimal effect on applications that use the database
Data Independence: a database should have an identity separate from the applications (computer programs, forms, and reports) that use it. The separate identity allows the database definition to be changed without affecting related applications.
The close association between a database and related programs led to problems in software maintenance. Software maintenance encompassing requirement changes, corrections, and enhancements can consume a large fraction of computer budgets. In early DBMSs, most changes to the database definition caused changes to computer programs. In many cases, changes to computer programs involved detailed inspection of the code, a labor-intensive process. This code inspection work is similar to year 2000 compliance where date formats must be changed to four digits. Performance tuning of a database was difficult because sometimes hundreds of computer programs had to be recompiled for every change. Because database definition changes are common, a large fraction of software maintenance resources were devoted to database changes. Some studies have estimated the percentage as high as 50% of software maintenance resources.
1-*
Schema levels:
- Internal level: implementation details for base tables (indexes, disk extents, clustering)
- Chapter 10 for physical database design
Mappings:
- Performed by the DBMS: relieve user of much work
- External to Conceptual: submit query using a view; DBMS translates to base tables
- Conceptual to Internal: SELECT statement implemented with loops, join order, index
usage, …
- Use views rather than base tables in applications
- DBMS translates queries on a view to query on lower level schema
1-*
External
FacultyAssignmentFormView: data required for the form in Slide 16 (Figure 1.9)
FacultyWorkLoadReportView: data required for the report in Slide 17 (Figure 1.10)
Conceptual: tables in Slide 11
Internal
Extra files to improve performance
To make the three schema levels clearer, Table 4 shows differences among database definition at the three schema levels using examples from the features described in Section 1.2. Even in a simplified university database, the differences among the schema levels is clear. With a more complex database, the differences would be even more pronounced with many more views, a much larger conceptual schema, and a more complex internal schema.
The schema mappings describe how a schema at a higher level is derived from a schema at a lower level. For example, the external views in Table 3 are derived from the tables in the conceptual schema. The mapping provides the knowledge to convert a request using an external view (for example, HighGPAView) into a request using the tables in the conceptual schema. The mapping between conceptual and internal levels shows how entities are stored in files.
1-*
Client-Server Architecture: an arrangement of components (clients and servers) and data among computers connected by a network. The client-server architecture supports efficient processing of messages (requests for service) between clients and servers.
To improve performance and availability of data, the client-server architecture supports many ways to distribute software and data in a computer network. The simplest scheme is just to place both software and data on the same computer (Figure 13(a)). To take advantage of a network, both software and data can be distributed. In Figure 13(b), the server software and database are located on a remote computer. In Figure 13(c), the server software and database are located on multiple remote computers.
1-*
Organizational Roles
Because databases are pervasive, there are a variety of ways in which you may interact with databases. The classification in Figure 14 distinguishes between functional users who interact with databases as part of their work and information systems professionals who participate in designing and implementing databases. Each box in the hierarchy represents a role that you may play. You may simultaneously play more than one role. For example, a functional user in a job such as financial analysis may play all three roles in different databases. In some organizations, the distinction between functional users and information systems professionals is blurred. In these organizations, functional users may participate in designing and using databases.
Functional users can play a passive or an active role when interacting with databases. Indirect usage of a database is a passive role. An indirect user is given a report or some data extracted from a database. A parametric user is more active than an indirect user. A parametric user requests existing forms or reports using parameters, input values that change from usage to usage. For example, a parameter may indicate a date range, sales territory, or department name. The power user is the most active. Because decision making needs can be difficult to predict, ad hoc or unplanned usage of a database is important. A power user is skilled enough to build a form or report when needed. Power users should have a good understanding of non-procedural access, a skill described in the first part of this book.
14.bin
1-*
Data administrator
Both positions require more than 1 db course
- 2nd course
1-*
Summary
Nonprocedural access is a crucial feature
Many opportunities to work with databases
Working with databases: can be lucrative but very demanding
First part of textbook: fundamentals of relational databases
Other chapters in Part 1:
Chapter 2: overview of the database development process
Billing
Meter
Reading
Payment
Processing
on different computers
Client
Server
Client
Server
Client
Server
Server
Database
Database
Client
Client
Client
Client
Client
1960s 1
proprietary program