+ All Categories
Home > Documents > Inf 523 Fundamentals of Information Technology: Databases ...gangolly/inf523sept092011.pdf · SQL...

Inf 523 Fundamentals of Information Technology: Databases ...gangolly/inf523sept092011.pdf · SQL...

Date post: 31-May-2020
Category:
Upload: others
View: 14 times
Download: 0 times
Share this document with a friend
27
Inf 523 Fundamentals of Information Technology: Databases (Fall 2011) Jagdish S. Gangolly Informatics CCI SUNY Albany September 9, 2011 Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (F
Transcript

Inf 523 Fundamentals of Information Technology:Databases (Fall 2011)

Jagdish S. GangollyInformatics

CCISUNY Albany

September 9, 2011

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

What is a Database?

Database is an

I integrated collection of

I logically-related records or files

I consolidated into a common pool that

I provides data for one or more multiple uses

Source: Wikipedia (http://en.wikipedia.org/wiki/Database)

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Examples – By Application

I Accounting (Inventory, Asset management, Payroll/HumanResources, . . .)

I Marketing

I Flight reservations

I Census

I Library catalog

I Bibliographic databases

I Geneology databases

I Patient management

I . . .

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Examples – by Type of Data

I Full text (Lexis/Nexis, WestLaw, . . .)

I Images ( Picassa, maps.google, . . .)

I Bibliographic (Citeseer, Archiv, . . .)

I Numeric (Accounting, Marketing, . . .)

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Examples – By underlying model

I Flat File (Spreadsheets, . . .)

I Hierarchical (IMS, . . .)

I network (IDMS, . . .)

I Relational & Object-Relational (MS-Access, Oracle, DB2,Informix, . . .)

I Object (Objectivity/DB, Objectstore, POET, JADE, . . .)

I noSQL (Column-based: Hadoop / HBase, Cassandra,Hypertable; Document stores: MongoDB, Couch; Key-valuestores: MEMBASE, Levelbase; Graph Databases: InfoGrid,Neo4J , HyperGraphDB)

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

SQL vs. NoSQL Databases

I SQL databases require tables schemas to be defined first; notso with noSQL databases

I SQL databases usually involve joining tables, not so withnoSQL databases

I SQL databases do not scale well horizontally (when newnodes/machines are added); noSQL databases scale wellhorizontally

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Traditional Databases

Figure: Traditional Databases (non-integrated)

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Why Databases?

I Data independence (data defined independent of applications)

I Minimum Data redundancies (minimum duplication of data)

I Ease in data access by the users (users do not need to knowhow data is stored, but know just what the data means)

I Enforcement of standards & security over data, and enhancedintegrity of data

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Modern Databases

Figure: Modern Databases (integrated)

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Modern Databases – Architecture

Figure: Modern Databases – Architecture

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Features of Modern Database Systems/TransactionProcessing Systems

I High availability

I High reliability

I High thruput

I Low response time

I Long lifetime

I Security

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Players in Database Systems

I User

I Database designer

I Database administrator

I Systems analyst

I Applications programmer

I Project manager

I System administrator

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Relational Databases

I Relations (tables, which have a schema)eg., Movies(title, year, length, genre)

I Tuples (rows)

I Attributes (columns)

I Relation as predicate (a declarative statement that is true orfalse)

I Operations on tablesI Set operations on relations (union, intersection, difference)I Projection (selection of some columns)I Selection (selection of some rows based on some criterion)I Cartesian product (cross product) of relationsI Joins of relations

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

SQL: Top level Overview

I Data Definition Language (DDL)I CREATE TABLEI DROP TABLE

I Data Manipulation Language (DML)I SELECT . . . FROM . . . WHEREI DELETEI INSERTI UPDATE

I Data Control Language (DCL)I GRANTI REVOKE

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Relational Algebra, Relational Calculus, and the CoddTheorem

I Relational algebra lists the possible operations on tables andthen formulates queries in terms of those operations. It istherefore a procedural way of specifying queries

I Relational calculus, on the other hand, is a descriptive ordeclarative way of specifying queries.

I Codd’s theorem states that the query language based on therelational algebra is as expressive as the query language basedon the relational calculus. In other words, it says that ”herelational algebra and the relational calculus are essentiallylogically equivalent: for any algebraic expression, there is anequivalent expression in the calculus, and vice versa”(http://en.wikipedia.org/wiki/Relational calculus).

I EF Codd called languages with the expressive power ofrelational algebra to be relationally complete.

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Relational Algebra, Relational Calculus, and the CoddTheorem

I Relational algebra is an imperative, variable-free language,while relational calculus is a logical language with variablesand quantification.

I A language is relationally complete does not mean any querycan be formulated in it. For example, aggregations (such assum or averages) can not be expressed in relational algebralanguage, but can be expressed in SQL.

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Data Models

I Relational and Object-Relational (Structured)

I Semi-structured (XML)

I Unstructured (mostly for images)

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Transaction Processing

The tasks performed in transaction processing include

I Logging (facilitates recovery in case of failure/crash

I Concurrency control, to ensure isolation of transactions

I Deadlock resolution

The important properties (ACID) that must be met by transactionsare

Atomicity Each transaction is executed completely or not at all

Consistency Each transaction maintains database consistency

Isolation The concurrent execution of a set of transactions hasthe same effect as some serial execution of that set

Durability The effect of committed transactions arepermanently recorded in the database

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Semantic Data Modeling

I Entity-Relationship Diagram (ERD)I Entity

I Weak entitiesI Strong entities

I RelationshipI Degree of a relationship (binary, ternary, n-ary)I Cardinality of a relationship

I AttributesI Identifier

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Entity-Relationship Data Model

I Entities are denoted by rectangles and relationships by linesconnecting the entities

I Cardinalities of the relationships reflect the business rules &policies that govern the operations of the organisation forwhich the database is designed

I Cardinalities express the number of relationships with otherentities that an entity instance can participate

I The cardinalities depend on the exact definition of the entities

I Cardinalities must be specified at both ends of eachrelationship

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Cardinalities

Mandatory One An entity instance MUST participate in exactlyone relationship (ie., cardinality of 1)

Optional One An entity instance MAY participate in onerelationship (ie., cardinality of 0 or 1)

Mandatory Many An entity instance MUST participate in at leastone relationship (ie., cardinality of 0. . . ∞)

Optional Many An entity instance MAY participate in one or morerelationship (ie., cardinality of 0. . . ∞)

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Business Rules & Policies

I It is important to incorporate business rules and policiesgoverning the data into the data model in order to maintainthe integrity of the database

I It is the job of the database analyst to understand those rulesand policies and implement them in the database design

I Only some of the business rules & policies are represented inthe cardinalities. Others are incorporated into databasesthrough programs

I Business rules & policies must be: Declarative, Precise,atomic, consistent, expressible, distinct, & business-oriented

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Entities

I Entity is a set of objects that share common properties.There may be many objects (persons, things, . . .) belongingto such a set and are called entity instances

I It is a convention to use singular names for entities and toshow them in upper case letters (EMPLOYEE, ORDER,ITEM, SHIPMENT, . . .)

I Entities may be strong or weak. They are strong if theirexistence does NOT depend on other entities. They are weakif their existence depends on other entities. For example, aCOURSE is a strong entity but a COURSE-SECTION isweak because it can not exist independent of the course thatit is a section of. Weak entities are represented by framedrectangles.

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Attributes

I Attributes are measured characteristics of entities. Databasemanagement syatems provide us with many data types torepresent attributes

I An attribute may be simple, as in case of EmployeeName orcomposite as is the case with EmployeeAddress which may becomposed of street address, city, state and zip-code. Simpleattributes are atomic and can not be broken down, butcomposite attributes have internal structure

I Attributes may be single-valued or multi-valued.

I Attributes may be stored or derived. If they are derived, theyneed not be stored since they can be computed from thevalues of stored attributes. Foe example, if the price andquantity is known, one can compute the amount bymultiplying the two. Storing amount in the database would beredundant

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Attributes

I Identifier of an entity is an attribute or a set of attributesthat uniquely identifies an entity instance. Examples includeemployeeSocialSecurityNumber , vehicleID, . . .

I A Composite identifier is an identifier that consists of acomposite attribute. An example is flightID which consists ofFlight Number and Date

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Relationships

I A Relationship Type is an association between two or moreentities (entity types). A Relationship instance is anassociation between two entity instances.

I If the relationship type is many-to-many (crows foot at bothends) then the relationship will have its own attributes. Indatabase design, we convert such relationships into(associative) entities, which are weak entities since theirexistence depends on the existence of entities at both ends ofsuch many-to-many relationship.

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)

Relationships

I Degree of a relationship: A relationship can be betweenmore than two entities. For example, f it is between threeentities it is called a ternary relationship. In general, arelationship can be n-ary, between n entities. On the otherhand, a relationship can be between an entity and itself, andsuch relationships are called unary (or recursive) relationship.

I An Example of n-ary relationship:

Delivery of items to a customers by drivers on trucks

Jagdish S. Gangolly Informatics CCI SUNY Albany Inf 523 Fundamentals of Information Technology: Databases (Fall 2011)


Recommended