1 Introduction1 Introduction
1.1 Databases vs. files1.2 Basic concepts and terminology1.3 Brief history of databases1.4 Architectures & systems1.5 Technical Challenges1.6 DB lifecycle
References: Kemper / Eickler chap. 1, Elmasri / Navathe chap 1+2,and "Intro" of most DB books
01-DBS-Intro-2© HS-2010
1.1 Databases Systems versus File Based Processing
ExampleAdministration of courses, lecturers, rooms…
in a university ... KVV ;)
Typical operations:• "Find all my courses in summer term 2010"• "Find a room with capacity >20 Friday 8 am"• "Calculate mean number of courses for the
students"
Typically interactive and batch applications
01-DBS-Intro-3© HS-2010
Why Database systems?
Application program
Files
Reading and Writing Random Access Filesin Java (taken from Java API)
readpublic int read(byte[] b, int off, int len) throws IOException
Reads up to len bytes of data from this file into an array of bytes.This method blocks until at least one byte of input is available. Although RandomAccessFilenot a subclass of InputStream, this method behaves in the exactly the same way as theInputStream.read(byte[], int, int) method of InputStream. Parameters:b - the buffer into which the data is read.off - the start offset of the data.len - the maximum number of bytes read.
Returns:the total number of bytes read into the buffer, or -1 if there is no more data because the end of the file has been reached.
Throws:IOException - if an I/O error occurs.
More than 30 low level operations
01-DBS-Intro-4© HS-2010
AbstractionAbstraction
What is an appropriate language to manipulate data?
SELECT c.titel, c.hoursFROM Courses c, Lecturers eWHERE c.lecturer = e.id AND e.name = "HS"
AND c.sem = "SoSe2010" .
titel hours
DBS 5
TAS 4
...
Result: a table with 2 colums
01-DBS-Intro-6© HS-2010
File system versus DBSFile system versus DBS
Why database systems?
DBS provide an abstraction from the physical representation of data and from the implementation of operations (on data)
Aprog
Data base
High Level interface
AProgAProg
DBMS
SELECT name, semFROM RegisteredWHERE subject = 'Informatik'AND sem >= 4;
01-DBS-Intro-7© HS-2010
Files versus Database: differencesFiles versus Database: differences
– Application oriented read/write interface, high level access
– Database has it’s own data description (!) - the schema
– More secure access– Concurrent access to data– Fault tolerance
Nonfunctional characteristics
01-DBS-Intro-8© HS-2010
1.2 Basic Concepts and Terminology1.2 Basic Concepts and Terminology
1.2.1 Data independence
• Guiding principle: introduce levels of abstraction • Application program should be independent of physical
organization of datae.g. hash, B-Tree or sequential access to records should be transparent to the program (ignoring performance impacts)
Important term!
Def.: Physical Data independence
Application programs are not compromised when storage structure is changed
01-DBS-Intro-9© HS-2010
Basic AbstractionsBasic AbstractionsData Independence (cont)
ExampleSuppose participation in exams has to be introduced for each student in the university database
Goal: existing application programs should not need to be changed, except when logically necessary. (e.g. grades for exam presupposes partipation)
Def.: Logical data independence Application programs are not compromised by changes of the schema (if possible)
01-DBS-Intro-10© HS-2010
33--SchemaSchema--ArchitectureArchitecture
ANSI/X3/SPARC Architectural Model“separate physical aspects from logical data structuring
from individual user (application) views of the data”
Storage model hidden from applications
Physical layer
Conceptual layer
Storage and data model independent definitionof all data in the database.Sometimes called logical layer
External viewExternal view External view
Different applications have differentviews of storage and data model
01-DBS-Intro-11© HS-2010
How to specify a database?How to specify a database?
Important terms!
+name+matr_nr
Student+title+course_Nr
Course+attends
*
+
*
Conceptual modelDescribes high-level concepts in DB design models subset of real world.
Entity relationship model,(or UML: Universal Modelling Language)
CREATE TABLE student(name CHAR(..),MatrNr NUMBER,....)
Logical Data Model (DM)
e.g.
Search Tree
CREATE INDEX …
Physical (data) modelDeclarative ("logical") description of implementation schema
e.g.
Search Tree
CREATE INDEX …
Physical (data) modelDeclarative ("logical") description of implementation schema
01-DBS-Intro-12© HS-2010
Def.: Database schema:Formal description of some part of reality in terms of the data model (e.g. tables)
Schema defined on different levels: logical, physical, external
Schema: - specifies content of database on a type level,- in most cases: schema separate from data
"schema is first class object"- may be changed over time, but basically
static.- does not exist for files (hidden in program)
01-DBS-Intro-13© HS-2010
Def.: DatabaseSet of data objects conforming to a given database schema
Database: dynamic, time variantDB schema: basically static.
Important aspect: Primitives for
- schema specification- database operations
⇒ Data Model
01-DBS-Intro-14© HS-2010
1.2.2 Data models1.2.2 Data models
Most important data model today: Relations (tables) and SQL (or relational algebra)
Def.: Data Model is a language - for defining the schema (Data Definition Language DDL)
- for accessing and updating the DB(Data Manipulation Language DML)
Name phoneFName title
KunzHinz
3310133700
BobCathy
ProfDr.
schema
data: tables (set of rows)
Important term!
01-DBS-Intro-15© HS-2010
TheThe Relational Relational ModelModel
1970: Relational model [E.F. Codd: The Relational Data Model] -> reader
since 1980: RDBMS everywhere
Author
Name Email
HuntKatzMaus
[email protected]@...piep@...
FName
TinaAnnaCarl
Lecturer
Name phone
KunzHinz
3310133700
FName
BobCathy
title
ProfDr.
Course
Lecturer
HinzKatzHinz
Title
DBSCompilerSeminar
room
050305
....• All data represented
as tables,
• Schema ("Metadata") separate from data
Table name
01-DBS-Intro-16© HS-2010
Legacy data models (1)Legacy data models (1)
Hierarchical data model: hierarchies of record types
Still in use: IMS (Information Management System), a mainframe oldie.
customer
account
accRecord
A bank customer has one or more accounts, für each account,there are 0 or more accounting records
Kunz
198003 198435
+100€ -22€ +112€ +720€
schema data
01-DBS-Intro-17© HS-2010
Legacy data model (2)Legacy data model (2)
Network data model (“CODASYL”) : graph like data structures (see reader
SUPPLIER
PART
SchemaInstances
Example by Codd / Date, ACM SIGFIDET 1974
01-DBS-Intro-18© HS-2010
OtherOther datadata modelsmodels (1): XML (1): XML Pre-XML representation of data:
XML representation of the same data:
“PO-1234”,”CUST001”,”X9876”,”2”,”14.98”
Elements
<?xml version="1.0"?><PURCHASE_ORDER>
<PO_NUM> PO-1234 </PO_NUM><CUST_ID> CUST001 </CUST_ID>
<ITEM ItemNum ="x9876">
< QUNTY > 2 </ QUNTY >
<PRICE> 14.53 </PRICE>
</ITEM></PURCHASE_ORDER>
Attribute
Prologue
01-DBS-Intro-19© HS-2010
XML exampleXML example
Graphical representation of XML data
XML documents- tree structured - data and metadata in the same document
(as opposed to RDBS)
PO_NUM
PURCHASE_ORDER
Cust:_ID
QUNTYPO-1234 CUST001
2
{ ItemNum=X9876 }ITEM
PRICE
14.53
For those who donot know what XMLis: learn the basicshere.
01-DBS-Intro-20© HS-2010
Other Data models (2) Other Data models (2)
• RDF (Ressource description Framework)
• Object oriented (data) model? ... try to define.
• There is a set of Nodes (call it N). • There is a subset of N known as the PropertyTypes (call it P). • There is a set of 3-tuples called T, whose elements are
informally known as properties. The first item of each tuple is an element of P, the second item is an element of N and the third item is either an element of N or an atomic value (e.g. aUnicode string).
(Core Data Model of RDF, see http://www.w3.org/TR/WD-rdf-syntax-971002/ )
01-DBS-Intro-21© HS-2010
Other Data Models (3)Other Data Models (3)
Lightweight database systems– Key value stores ("schema less DBS")
– basic ideas:very simple schema language ("key:value"), very efficient access by key,offload most correctness guarantees to application.
Example (couchDB):{ "_id":"biking","_rev":"AE19EBC7654","title":"Biking",
"body":"My biggest hobby is mountainbiking. The other day...",
"date":"2009/01/30 18:04:11" }
01-DBS-Intro-22© HS-2010
Data modelData model
Caveat: – Data Model is a language.– Data Model is not the result of modeling some reality– This process is called data modeling– The result is the DB schema
Relational Data model:
~"Use tables to represent your data"
+name+matr_nr
Student+title+course_Nr
Course+attends
*
+
*
HinzKatzHinz
DBSCompilerSeminar
050305
Data model Schema (conceptual)
Database
...
01-DBS-Intro-23© HS-2010
1.2.3 Data base languages1.2.3 Data base languages
Conceptual level
External level (view)Data Definition (DDL) and Manipulation Language(DML)• Define logical data
structures (schema)• Query database
Data Administration Language• Define access path• Adjust tuning and otherparameters
Different language levels for relational (tabular) DBSall covered by SQL (Structured Query Language)
Physical level
01-DBS-Intro-24© HS-2010
SQL and Programming languagesSQL and Programming languages
Programming Languages– SQL is an interactive language– Most applications don’t allow users to use SQL
directly but have their own GUI (e.g. a forms based web interface )
– How do these applications talk to the DBS?
Embedded SQLDBS define an Application Programming Interface
(API) which is basically a standardized interface for calling the DBS from a program with the SQL-command to be executed and for transferring the result data.
Most popular: Embedded SQL / C and JDBC (Java)
01-DBS-Intro-25© HS-2010
1.3 History1.3 History at a at a glanceglance
• Business Data Processing as the driving force for DBS development
• ~ 1965 File system approach to data management leads to chaos.
• What are the right abstractions? ⇒ data model• 1970: Tables!
(Codd's seminal paper)• 1973: Research prototypes for Relational DBS,
Transactions• 1980: RDBMS everywhere,
Distributed DBS
01-DBS-Intro-26© HS-2010
History (cont) History (cont)
• 1990: Object orientation ⇒ OO data model and OODBMS ⇒ Object-Relational systems
• 1995: Wide scale distribution, WEB
• 1997: Semistructured data, Image DB, ... , XML / DB
• 2000++ Mobility and DBMS• 2005++ Unstructures Data – e.g. text. Querying text???
• Automated Object-relational mapping: only objects in the program, don’t care about relations
01-DBS-Intro-27© HS-2010
1.4 Architectures and Systems1.4 Architectures and Systems
Legacy systemsInformation Mangement Systems (IMS), hierarchical systems by IBMUniversal Data Store (UDS) , network system by Siemens
The dominating Relational DBMSOracle PostgresMySQLSQL-Server / MicrosoftSybaseDB2 / IBM, InformixAdabas (Software AG)personal, low cost desktop DBS: MSAccessJava "persistence" related DBS: Derby, ...
01-DBS-Intro-28© HS-2010
Integrated systemsIntegrated systems
More and more integration with application software, e.g. SAP R3 uses Oracle (mostly) behind the curtains
Corba-client
Java-client …….
…
Business Framework (objects, rules)
Component e.g. Controlling
Component e.g. Financial Accounting
Database
ERP system(EnterpriseRessourcePlanning)
01-DBS-Intro-29© HS-2010
MainframeMainframe
• Mainframe architecture
– Transaction monitor queues requests, schedules application programs (usually simple application logic)
Still in use today, e.g. flight reservation systemsvery efficient, but expensive hardware
Transaction-monitor
Applications
DBSOperating system
Terminals...
typically "big iron"(" 'ne richtig
große Maschine")
01-DBS-Intro-30© HS-2010
22--tier Architecturetier Architecture
Two-tier architecture
typically used with 4GL (“Fourth Generation Languages”)i.e. languages for easy development of simple form-based application and reports.
Transaction support through database systemUsed in medium size applications
Client workstation- presentation, requests, GUI -
Database server
Proprietary protocol
01-DBS-Intro-31© HS-2010
ThreeThree--tier Architecture (1)tier Architecture (1)
Application oriented architectureseparation of presentation, application logic and
DB access
e.g. CGI or Servlet application running under control of a web server
Presentatione.g web server
Database serverApplica-tion progs
Browser
http
JDBC /ESQL/ODBC
01-DBS-Intro-32© HS-2010
ThreeThree--tier Architecture (2)tier Architecture (2)
Middle tier: framework for implementing business logic and business objects
Particularly useful with automatic object-relational mapping between database (relational) and programming language (object oriented)
Application serverimplements presentation, session control and business logic (applications)
Database server
e.g. JavaApplet
Data access transparent for application programs
HTML-client
Web-Server
01-DBS-Intro-33© HS-2010
1.5 Technical challenges1.5 Technical challenges
Operational requirement: The DBS should never do anything which destroys the consistency of database and modeled reality (called integrity)
Example:Transfer 100 $ from one account a1 to another one a2. Several steps are required: reading the value of a1, decrease the amount (100 $), write a1, increase the value of a2 by the amount.
Main technical issue:Execution of operations must guarantee correctness properties
01-DBS-Intro-34© HS-2010
Technical challengesTechnical challengesOperational requirement: No interference of operations of different users
Example: Auction system. Two independent bidders A, B read highest bid h, B's bid : h+a , A's bid h+bB's bid is lost even if h+a < h+bA and B are the programs executing the bids for human users
01-DBS-Intro-35© HS-2010
Technical ChallengesTechnical Challenges
Synchronisation of independent DB-users:
How to avoid conflicting read / write access ?concurrent programming
But DB have many resources: each record is a resource –there may be millions (*) of them
Synchronization of thousands of concurrent operations ?
(*) Wal-Mart: 200 Mio transaction / week = 300 TA/sec – 24/7source: The Economist Feb 27,2010
01-DBS-Intro-36© HS-2010
Technical challengesTechnical challengesFail-safe operation
Example: System crash when writing a block with account data on disk. DB must not be corrupted
System failure should not corrupt database stateEfficiency
Hundreds of clients active on the same DB,Hundreds or thousands operations / sec,Response time requirement in interactive environment: < 3 sec
Data securityAccess by unauthorized users might be a disaster
01-DBS-Intro-37© HS-2010
1.6 Lifecycle1.6 Lifecycle
Requirements analysis
Conceptual Design
Schema design- logical (“create tables”)
Schema design- physical (“create access path”)
Loading, administration,tuning, maintenance, reorganization
System analystDB designer
Application programmer
Application programmerDB administrator
DB administrator
Compare: Lifecycle of HW ~3 yearsSoftware ~ 5 years, Data 30 years !?
01-DBS-Intro-38© HS-2010
SummarySummary• Database ≠ Database System• Database: data and metadata (schema)• Data model: high level data definition and data
manipulation language• Relational Data Model (RDM) / SQL• Two- /Three-tier-architecture• Technical requirements
ConcurrencyFault-toleranceIntegrityEfficiency
• Life cycle