+ All Categories
Home > Documents > Data Modeling 1

Data Modeling 1

Date post: 10-Jan-2016
Category:
Upload: bishnupriya-panda
View: 24 times
Download: 0 times
Share this document with a friend
Description:
Data Modeling

of 37

Transcript
  • * Data Modeling Concepts - CoverageWhat is Data Modeling, Why Model?

    Data Modeling Terminology

    Logical and Physical Model

    Normalization and De-Normalization

    Dimensional Modeling

    Star and Snowflake Schema

    Fact and Dimension Tables

    Relational vs. Dimensional Modeling

    Erwin Case Tool

    Data Modeling Concepts

  • "Model"?

    A model is a symbolic or abstract representation of something real or imagined.*

  • What is Data Modeling ??Is a method used to define and analyze data requirements needed to support the business processes of an organization.

    A Data model is a conceptual representation of data structures (tables) required for a database

    Powerful in expressing and communicating the business requirements

    Visually represents the nature of data , business rules governing the data and the way it is organized in the system.*

  • Why .. Why Data models*

  • Why ?? (contd ..) Business rules , processes change over period of time. So a small change leads to large changes in the computer systems.

    Entity types incorrectly identified. This leads to data replication.

    No standardization of data. So data cannot be shared with customers or Internal Management.*

  • *Data ModelPlanned HouseHouse =SystemBlueprint = Data ModelHouse built on a good blueprint can be used for many purposes.System built in a good data model can have new ways of doing business, new lines of business, even new businesses - without throwing out the system.

  • Data Model Basic Building BlocksEntityAnything about which data will be collected/storedAttributeCharacteristic of an entityRelationshipDescribes an association among entitiesOne-to-one (1:1) relationshipOne-to-many (1:M) relationship Many-to-many (M:N or M:M) relationshipConstraintA restriction placed on the data

  • Business RulesBrief, precise and unambiguous descriptions of policies, procedures or principles within the organization

    Describe characteristics of the data as viewed by the company

  • Translating Business Rules to Data Model ComponentsStandardize companys view of dataCommunication tool between users and designersAllow designer to understand the nature, role and scope of dataAllow designer to understand business processesAllow designer to develop appropriate relationship participation rules and constraintsPromote creation of an accurate data modelNouns translate into entitiesVerbs translate into relationships among entitiesRelationships are bi-directional

  • *Data Warehouse MartsDeliveryStaging AreaAd hoc Querying / Reporting / Viewing Data Mart BuilderOther Custom DatabasesOperationalDataOLAP AnalysisAlertsData Integration LayerData MartsAnalysts PortalInternal Source SystemExternal Source SystemSource SystemsETLM E T A D A T AExternal PortalPerformance ReportingClient ReportingMarketing and Sales TeamsExternal PortalsETLReporting / Querying / OLAP Viewing LayerTypical Data FlowData Modeling: Important in Staging, DWH and DataMartsData Modeling Concepts

  • * Data Modeling TerminologyENTITY: The entity is a person, object, place or event for which data is collected

    ATTRIBUTE: Parameters which define the properties of an entity

    RELATIONSHIPS: Business rules that determine how entities interact with each othere.g. Sales Representative SERVES Customer

    CARDINALITY: Defines the relationship between the entities in terms of numbersOPTIONAL: Sales Representative could have zero or many customersMANDATORY: At least one product should be listed in an order

    Data Modeling Concepts

  • *PRIMARY KEY: Column(s) to uniquely identify each record in a table

    FOREIGN KEY: Identifies column(s) in one table that refers to columns(s) in another table (parent)One to One Branch_Master(Br_Cod, Ctry_Cod)Branch_Sales(Br_Cod, Year, Sales)

    One to ManyBranch_Master(Br_Cod, Ctry_Cod)Country(Ctry_Cod, Name)

    Many to ManyArtist(Artist_ID, Name)Album(Album_ID, Album_Name)Link_Artist_Album(Artist_ID, Album_ID)Data Modeling Concepts Data Modeling Terminology contd.

    Branch_SalesBr_Cod (PK)YearSales

    Branch_MasterBr_Cod (PK)Ctry_Cod

    Branch_MasterBr_Cod (PK)Ctry_Cod

    CountryCtry_Cod (PK)Name

    ArtistArtist_ID (PK)Name

    AlbumAlbum_ID (PK)Album_Name

    Link_Artist_AlbumArtist_ID (PK)Album_ID (PK)

  • * Logical and Physical ModelLOGICAL Model: Representation of the business requirements, entities, attributes and relationships

    PHYSICAL Model: Includes tables, columns, constraints, database properties for physical implementation

    Data Modeling Concepts

    Logical Data ModelPhysical Data ModelRepresents business information and defines business rulesRepresents physical implementation of the model in a database.EntityTableAttributeColumnPrimary KeyPrimary Key ConstraintRuleCheck Constraint, Default ValueRelationshipForeign Key

  • * Database NormalizationNORMALIZATION is the process of efficiently organizing data in a database to meet following goalsEliminating redundant data Ensuring proper data dependenciesAdvantages of NormalizationReduce the amount of space a database consumesData is logically stored and prevent data anomaliesFaster Processing in OLTP systemsNormal FormsFirst Normal FormSecond Normal FormThird Normal FormWhy not higher Normal FormsRequires high-end database featuresComplexity increases, size constraintsMost applications work well with 3NF

    Data Modeling Concepts

  • * De-NormalizationProcess of introducing redundancy in a normalized database in order to address performance problems

    First Normalize, then identify performance problems, exhaust normal tuning methods, then go for denormalization

    De-normalize a database to reduce number of joins required in a query, usually for reporting purposes

    FACT Tables are normalized, DIMENSIONAL tables often contain de-normalized data

    Normalized alternative to Star Schema is Snowflake Schema

    De-normalized Product Normalized Product TablesData Modeling Concepts

    ProductProd_Code (PK)Prod_NameBrand_CodeBrand_Manager

    ProductProd_Code (PK)Prod_NameBrand_Code

    BrandBrand_CodeBrand_Manager

  • * Dimensional ModelingDimensional modeling (DM) is a LOGICAL design technique often used for Data Warehouses

    Composed of a central FACT Table, and a set of smaller tables called DIMENSION Tables

    The physical architecture of Dimensional Model is represented in STAR Schema or SNOWFLAKE Schema

    AdvantagesDimensional Model is a predictable, standard framework.Extensible to accommodate unexpected new data elements and design decisionsSupports SLOWLY CHANGING DimensionsUsed for calculating SUMMARIZED dataData Modeling Concepts

  • Relational vs. Dimensional Differences : Database Vs Dataware house

    Relational ModelingDimensional ModelingData is stored in RDBMS tablesData is stored in RDBMS or MDBs / cubesData is normalized and optimized for OLTPData is de-normalized and optimized for OLAP, DWHTransaction PerformanceQuery PerformanceVolatile (many updates) and time variantNon-volatile and usually time invariantDetailed level of segregated transaction dataAggregated data and measures usedNormal ReportsDrag and Drop multidimensional OLAP reports

  • Relational vs. Dimensional (Cont)DataWarehouseProductionSystemEndUserProductionSystemEndUserERMDMDM gives end users a better way to access the data contained in the organization's operational systems

  • Relational vs. Dimensional (Cont)

  • * STAR Schema (Example)Data Modeling Concepts

    Fact_SalesTime (PK)Product (PK)Geography (PK)Customer (PK)Unit_SalesPriceSales_Amount

    Dim_CustomerCustomer (PK)Cust_NameCust_PhoneEmail

    Dim_ProductProduct (PK)Prod_NameProd_DescCategory

    Dim_TimeTime (PK)DayMonthQuarterYear

    Dim_GeographyGeography (PK)BranchCityStateCountry

  • * STAR Schema contd.Fact Tables are Normalized, Dimension Tables are De-normalized

    AdvantagesEasier to understand and navigateBetter performance minimizes number of joinsSupports multi-dimensional analysisExtensible design supports changing business requirementsAllows relative easy maintenanceRecommended for most Decision Support Systems

    DrawbacksMay lead to multiple dimension tables

    Data Modeling Concepts

  • *SNOWFLAKE Schema (Example)Data Modeling Concepts

    Fact_SalesTime (PK)Product (PK)Geography (PK)Customer (PK)Unit_SalesPriceSales_Amount

    Dim_CustomerCustomer (PK)Cust_NameCust_PhoneEmail

    Dim_ProductProduct (PK)Prod_NameProd_DescCategory

    Dim_TimeTime (PK)DayMonth

    Dim_CountryCountry (PK)

    Dim_MthMonth (PK)Quarter

    Dim_QtrQuarter (PK)Year

    Dim_YearYear (PK)

    Dim_StateState (PK)Country

    Dim_CityCity (PK)State

    Dim_GeographyGeography (PK)BranchCity

  • * SNOWFLAKE Schema contd.Fact Tables are Normalized, Dimension Tables are Normalized

    AdvantagesAvoids redundancy and saves storageShould improve understanding and overall performanceQuick response time when queries involve aggregation

    DrawbacksComplex queries and more foreign key joinsComplicated maintenanceExplosion in the number of tables in the database

    Data Modeling Concepts

  • * Relational Vs. Dimensional ModelingDecision to go for OLTP or Data Warehouse is determined by the business needs of the organizationData Modeling Concepts

    Relational ModelingDimensional ModelingData is stored in RDBMS tablesData is stored in RDBMS or MDBs / cubesData is normalized and optimized for OLTPData is de-normalized and optimized for OLAP, DWHEntity Driven, Transaction PerformanceData Driven, Query PerformanceLess indexedHighly indexedVolatile (many updates) and time variantNon-volatile and usually time invariantDetailed level of segregated transaction dataAggregated data and measures usedNormal ReportsDrag and Drop multidimensional OLAP reports

  • * ERwin Database Design and Modeling ToolEffective case tool for Logical / Physical data modeling

    Supports Dimensional Modeling

    Entity, Attributes and Relationships can be easily defined

    Erwin talks to the back-end database Reverse Engineering and Forward Engineering

    Subject areas to facilitate the view of data marts and merging them into the Enterprise Wide Data Warehouse (EDW)

    Reports - Standard set of reports provided by ErwinData Modeling Concepts

  • Things to avoid in a Data Modeling

    Vague Purpose Dont build a model without understanding the business rationale. The purpose for a model dictates the level of detail (just entities and relationships, fully attributed, with data types and full constraints). Literal Modeling Data modeling cannot be done literally only with Customer inputs. We need to capture and solve the problem that the customer is imperfectly describing. We need to pay attention to the hidden true requirements. You must interpret and abstract what the customer tells you. *

  • Large Size As a general rule, a model to be no more than 200 tables. The reason is that large models involve more work. Need to simplify with high level of abstraction.Create Subject Areas for better readability and maintenance.Speculative Content At least 90 percent of a model should pertain to immediate needs. As much as 10 percent can anticipate future needs. Otherwise you run the risk of scope creep .

    *Things to avoid in a Data Modeling

  • Lack of Clarity Normally a model should not be made difficult to understand for humans. This can be achieved by using DOMAINS , UDPs etc.Violation of Normal Forms An operational application concerns the routine operations of a business.An analytical applications emphasize complex queries that read large quantities of dataDo not violate normal forms, except for analytical applications and performance bottlenecks.*Things to avoid in a Data Modeling

  • Needless RedundancyIdeally a database should have a single recording of each data item.Dont include redundant data in an attempt to compensate for a poorly conceived application.Parallel AttributesParallel attributes are acceptable for a data warehouse and are often used in dimensions to simplify queries.*Things to avoid in a Data Modeling

  • Anonymous FieldsAs much as possible, you should clearly describe the data being stored and not use anonymous fields.a location table with anonymous fields. To find a city, you must search multiple fields.like Addr1 , Addr2 anad Addr3 where any info can be kept.It would be much better to put address information in distinct fields that are clearly named.*Things to avoid in a Data Modeling

  • Data Quality through Data ModelingData quality is the absence of intolerable defects. It is not the absence of defects. All data quality defects fall into a set of 9 broad buckets: 1. Lacking integrity of reference between data values across the model2. Entities lack unique identification3. Unreasonable values4. Attributes are used for multiple meanings5. Inconsistent formatting6. Incorrect data7. Missing data8. Miscalculations9. Data that falls outside of its intended codification*

  • The major data modeling constructs relevant to data entry which relate to the data quality defect categories are:1. Uniqueness the enforcement that a column will have unique values in it2. Check guarantees that a columns value will fall in a predefined range or list3. Key forcing the integrity of desired references across entities like the existence of the customer before he places an order4. Mandatory forces a true value to be entered into a column5. Default setting a value when none is entered6. Null allowing null (no value) to be used in a column instead of forcing a valueDefaults and null constraints are usually more `problematic to data quality when used than when not used because they allow for an abstract value (or null, which is no value) to be used in place of a customized, relevant value.*

  • Popular Data Modeling Tools*All the above tools support Forward,Reverse Engineering and ERD.

    Tool NameCompany NameERWinComputer AssociatesEmbarcaderoEmbarcadero TechnologiesPower DesignerSybase CorpOracle DesignerOracle CorpRational RoseIBM

  • Q&A

  • Contact me for any queries A.Rajesh Kumar [email protected] No : 4918 1060 *

  • Thanks!*

    **********************


Recommended