BIS 512 - Database Management Systems
Ahmet Onur Durahimhttp://www.mis.boun.edu.tr/durahim/
Learning Objectives
• Database systems concepts
• Designing and implementing a database application
• Life of a Query in a Database System
Database Systems
• Data?
– Streams of raw facts representing events occurring in organizations
• Information?
– Data shaped into a meaningful form that is useful to human beings
• Database
– Organized collection of data
Information vs. Data
• Raw data is processes and organized to produce meaningful and useful information– Total unit sales, total sales revenue
Database Management System (DBMS)
• DBMS contains information about a particular enterprise– Collection of interrelated data– Set of programs to access the data– An environment that is both convenient and efficient
to use
• Database Applications:– Banking: all transactions– Airlines: reservations, schedules– Universities: registration, grades– Sales: customers, products, purchases
Transaction Processing Systems
• Serves operational managers and staff– Keep track of the elementary activities and transactions of
the organization (sales, cash deposits, flow of materials)
– Monitor the status of internal operations and the firm’s relations with the external environment
• Perform and record daily routine transactions necessary to conduct business– Sales order entry, payroll, and shipping
• Serve predefined, structured goals, tasks and resources at the operational level
• Major producers of information for the other systems and business functions
Transaction Processing Systems
Data Management Systems• Responsible for organizing and managing firm’s data so
that they can be efficiently accessed and used• Allows the definition, creation, querying, update, and
administration of databases (Transactions, Persistence of data, Recovery, Concurrency control)
Database Software Providers– Commercial
• Microsoft SQL Server, Oracle, IBM DB2, Sybase
• These 4 companies supply more than 90% of US DB market
– Open Source• MySQL, PostgreSQL
Data Management and Storage
• NoSQL and Big Data Systems
– Amazon Dynamo, Cassandra, MongoDB, Neo4j
– Apache Hadoop
Why do we need DBMSs?
• Computers were originally developed for number crunching– By time, data storage and processing became as
important as scientific computing
• Amount and types of data increased– Image/audio/video data
– Genome data
– Customer transactions
• DBMSs were developed to manage this data
Why do we need DBMSs?
• Data independence
• Efficient access
• Data integrity and security
• Uniform data administration
• Concurrent access
• Recovery from crashes
• User-friendly declarative query language
Terminology and Basic Ideas• Data Model: Describes conceptual structuring of data
stored in database– data model is set of records. (records might each have student-
ID and name)
• The relational model of data is the most widely used model today– Main concept: relation, basically a table with rows and
columns– Every relation has a schema, which describes the columns, or
fields
• Schema vs. Data– Schema describes how data is to be structured
• defined at set-up time, rarely changes• part of the "metadata"
– Data is actual "instance" of database, may change rapidly
Relational Model
Relational Model Concepts
Terminology and Basic Ideas
• Data Definition Language (DDL)
– Commands for setting up schema of database
– Process of designing schema can be complex
• may use design methodology and/or tool
• Data Manipulation Language (DML)
– Commands to manipulate data in database: RETRIEVE, INSERT, DELETE, MODIFY
– Also called "query language"
Database Design
• The process of designing the general structure of the database– Logical Design: Decide on the “good” collection of the
relation schemas
– Logical modeling deals with gathering business requirements and converting those requirements into a model• Business decision: What attributes should we record in the
database?
• IS decision: What relation schemas should we have and how should the attributes be distributed among the various relation schemas?
Database Design• The process of designing the general structure of
the database– Physical Design: Decide on the physical layout of the
database– Physical modeling deals with the conversion of the
logical, or business model, into a relational database model
– database software specific• The objects defined during physical modeling can vary
depending on the relational database software being used. • Most relational database systems have variations with the
way data types are represented and the way data is stored, although basic data types are conceptually the same among different implementations.
Terminology and Basic Ideas
• Data Definition Language (DDL)
– Commands for setting up schema of database
– Process of designing schema can be complex
• may use design methodology and/or tool
• Data Manipulation Language (DML)
– Commands to manipulate data in database: RETRIEVE, INSERT, DELETE, MODIFY
– Also called "query language"
Structured Query Language (SQL)• Widely used non-procedural database query language
• Question: What is the name, email address and yearly income of the customer with customerKey ‘11009’?
Customer TableDimCustomer
Components of a DBMS
• Database system has five main pieces;– Client Communications Manager: manages
communication between users and the database
– Process manager: encapsulates and schedules the various tasks in the system
– A statement-at-a-time query processing engine
– A shared transactional storage subsystem: knits together storage, buffer management, concurrency control and recovery
– A set of shared utilities: memory management, disk space management, replication, and various batch utilities used for administration
Architectural Components of a DBMS
Life of a Query
• A database interaction at an airport, in which a gate agent clicks on a form to request the passenger list for a flight
• This button click results in a single-query transaction
Life of a Query• The personal computer at the airport gate (the “client”)
calls an API that in turn communicates over a network to establish a connection with the Client Communications Manager of a DBMS– the responsibility of the DBMS’ client communications manager
• to establish and remember the connection state for the caller• to respond to SQL commands from the caller• to return both data and control messages (result codes, errors, etc.)
as appropriate
• In gate agent’s query example, the communications manager would – establish the security credentials of the client– set up state to remember the details of the new connection and
the current SQL command across calls– forward the client’s first request deeper into the DBMS to be
processed
Life of a Query• Upon receiving the client’s first SQL command, the DBMS
(Process Manager) must assign a “thread of computation” to the command– Make decision regarding admission control: whether the system
should begin processing the query immediately, or defer execution until a time when enough system resources are available to devote to this query
• Begin executing the gate agent’s query by Relational Query Processor– checks that the user is authorized to run the query, and
compiles the user’s SQL query text into an internal query plan– The plan executor consists of a suite of “operators” (relational
algorithm implementations) for executing any query• Operators implement relational query processing tasks including
joins, selection, projection, aggregation, sorting, etc.
Life of a Query• One or more operators exist to request data from the
database. These operators make calls to fetch data from the DBMS’ Transactional Storage Manager– Manages all data access (read) and manipulation (create,
update, delete) calls
– Invoke the transaction management code to ensure the well-known “ACID” properties of transactions
– Before accessing data;• locks are acquired from a lock manager to ensure correct
execution in the face of other concurrent queries.
• If the gate agent’s query involved updates to the database, it would interact with the log manager to ensure that the transaction was durable if committed, and fully undone if aborted
Life of a Query• Agent’s query has begun to access data records, and is
ready to use them to compute results for the client.– done by “unwinding the stack” of activities
• The access methods return control to the query executor’s operators– which orchestrate the computation of result tuples from database data
• as result tuples are generated, they are placed in a buffer for the client communications manager– which ships the results back to the caller.
• At the end of the query: the transaction is completed and the connection is closed– transaction manager cleaning up state for the transaction
– process manager freeing any control structures for the query
– communications manager cleaning up communication state for the connection
Life of a Query• Shared components and utilities that are vital to
the operation of a full-function DBMS • The catalog and memory managers are invoked
as utilities during any transaction – The catalog is used by the query processor during
authentication, parsing, and query optimization– The memory manager is used throughout the DBMS
whenever memory needs to be dynamically allocated or deallocated
• Remaining modules that run independently of any particular query, keeping the database as a whole well tuned and reliable