Date post: | 10-Jan-2016 |
Category: |
Documents |
Upload: | bishnupriya-panda |
View: | 24 times |
Download: | 0 times |
of 37
* Data Modeling Concepts - CoverageWhat is Data Modeling, Why Model?
Data Modeling Terminology
Logical and Physical Model
Normalization and De-Normalization
Dimensional Modeling
Star and Snowflake Schema
Fact and Dimension Tables
Relational vs. Dimensional Modeling
Erwin Case Tool
Data Modeling Concepts
"Model"?
A model is a symbolic or abstract representation of something real or imagined.*
What is Data Modeling ??Is a method used to define and analyze data requirements needed to support the business processes of an organization.
A Data model is a conceptual representation of data structures (tables) required for a database
Powerful in expressing and communicating the business requirements
Visually represents the nature of data , business rules governing the data and the way it is organized in the system.*
Why .. Why Data models*
Why ?? (contd ..) Business rules , processes change over period of time. So a small change leads to large changes in the computer systems.
Entity types incorrectly identified. This leads to data replication.
No standardization of data. So data cannot be shared with customers or Internal Management.*
*Data ModelPlanned HouseHouse =SystemBlueprint = Data ModelHouse built on a good blueprint can be used for many purposes.System built in a good data model can have new ways of doing business, new lines of business, even new businesses - without throwing out the system.
Data Model Basic Building BlocksEntityAnything about which data will be collected/storedAttributeCharacteristic of an entityRelationshipDescribes an association among entitiesOne-to-one (1:1) relationshipOne-to-many (1:M) relationship Many-to-many (M:N or M:M) relationshipConstraintA restriction placed on the data
Business RulesBrief, precise and unambiguous descriptions of policies, procedures or principles within the organization
Describe characteristics of the data as viewed by the company
Translating Business Rules to Data Model ComponentsStandardize companys view of dataCommunication tool between users and designersAllow designer to understand the nature, role and scope of dataAllow designer to understand business processesAllow designer to develop appropriate relationship participation rules and constraintsPromote creation of an accurate data modelNouns translate into entitiesVerbs translate into relationships among entitiesRelationships are bi-directional
*Data Warehouse MartsDeliveryStaging AreaAd hoc Querying / Reporting / Viewing Data Mart BuilderOther Custom DatabasesOperationalDataOLAP AnalysisAlertsData Integration LayerData MartsAnalysts PortalInternal Source SystemExternal Source SystemSource SystemsETLM E T A D A T AExternal PortalPerformance ReportingClient ReportingMarketing and Sales TeamsExternal PortalsETLReporting / Querying / OLAP Viewing LayerTypical Data FlowData Modeling: Important in Staging, DWH and DataMartsData Modeling Concepts
* Data Modeling TerminologyENTITY: The entity is a person, object, place or event for which data is collected
ATTRIBUTE: Parameters which define the properties of an entity
RELATIONSHIPS: Business rules that determine how entities interact with each othere.g. Sales Representative SERVES Customer
CARDINALITY: Defines the relationship between the entities in terms of numbersOPTIONAL: Sales Representative could have zero or many customersMANDATORY: At least one product should be listed in an order
Data Modeling Concepts
*PRIMARY KEY: Column(s) to uniquely identify each record in a table
FOREIGN KEY: Identifies column(s) in one table that refers to columns(s) in another table (parent)One to One Branch_Master(Br_Cod, Ctry_Cod)Branch_Sales(Br_Cod, Year, Sales)
One to ManyBranch_Master(Br_Cod, Ctry_Cod)Country(Ctry_Cod, Name)
Many to ManyArtist(Artist_ID, Name)Album(Album_ID, Album_Name)Link_Artist_Album(Artist_ID, Album_ID)Data Modeling Concepts Data Modeling Terminology contd.
Branch_SalesBr_Cod (PK)YearSales
Branch_MasterBr_Cod (PK)Ctry_Cod
Branch_MasterBr_Cod (PK)Ctry_Cod
CountryCtry_Cod (PK)Name
ArtistArtist_ID (PK)Name
AlbumAlbum_ID (PK)Album_Name
Link_Artist_AlbumArtist_ID (PK)Album_ID (PK)
* Logical and Physical ModelLOGICAL Model: Representation of the business requirements, entities, attributes and relationships
PHYSICAL Model: Includes tables, columns, constraints, database properties for physical implementation
Data Modeling Concepts
Logical Data ModelPhysical Data ModelRepresents business information and defines business rulesRepresents physical implementation of the model in a database.EntityTableAttributeColumnPrimary KeyPrimary Key ConstraintRuleCheck Constraint, Default ValueRelationshipForeign Key
* Database NormalizationNORMALIZATION is the process of efficiently organizing data in a database to meet following goalsEliminating redundant data Ensuring proper data dependenciesAdvantages of NormalizationReduce the amount of space a database consumesData is logically stored and prevent data anomaliesFaster Processing in OLTP systemsNormal FormsFirst Normal FormSecond Normal FormThird Normal FormWhy not higher Normal FormsRequires high-end database featuresComplexity increases, size constraintsMost applications work well with 3NF
Data Modeling Concepts
* De-NormalizationProcess of introducing redundancy in a normalized database in order to address performance problems
First Normalize, then identify performance problems, exhaust normal tuning methods, then go for denormalization
De-normalize a database to reduce number of joins required in a query, usually for reporting purposes
FACT Tables are normalized, DIMENSIONAL tables often contain de-normalized data
Normalized alternative to Star Schema is Snowflake Schema
De-normalized Product Normalized Product TablesData Modeling Concepts
ProductProd_Code (PK)Prod_NameBrand_CodeBrand_Manager
ProductProd_Code (PK)Prod_NameBrand_Code
BrandBrand_CodeBrand_Manager
* Dimensional ModelingDimensional modeling (DM) is a LOGICAL design technique often used for Data Warehouses
Composed of a central FACT Table, and a set of smaller tables called DIMENSION Tables
The physical architecture of Dimensional Model is represented in STAR Schema or SNOWFLAKE Schema
AdvantagesDimensional Model is a predictable, standard framework.Extensible to accommodate unexpected new data elements and design decisionsSupports SLOWLY CHANGING DimensionsUsed for calculating SUMMARIZED dataData Modeling Concepts
Relational vs. Dimensional Differences : Database Vs Dataware house
Relational ModelingDimensional ModelingData is stored in RDBMS tablesData is stored in RDBMS or MDBs / cubesData is normalized and optimized for OLTPData is de-normalized and optimized for OLAP, DWHTransaction PerformanceQuery PerformanceVolatile (many updates) and time variantNon-volatile and usually time invariantDetailed level of segregated transaction dataAggregated data and measures usedNormal ReportsDrag and Drop multidimensional OLAP reports
Relational vs. Dimensional (Cont)DataWarehouseProductionSystemEndUserProductionSystemEndUserERMDMDM gives end users a better way to access the data contained in the organization's operational systems
Relational vs. Dimensional (Cont)
* STAR Schema (Example)Data Modeling Concepts
Fact_SalesTime (PK)Product (PK)Geography (PK)Customer (PK)Unit_SalesPriceSales_Amount
Dim_CustomerCustomer (PK)Cust_NameCust_PhoneEmail
Dim_ProductProduct (PK)Prod_NameProd_DescCategory
Dim_TimeTime (PK)DayMonthQuarterYear
Dim_GeographyGeography (PK)BranchCityStateCountry
* STAR Schema contd.Fact Tables are Normalized, Dimension Tables are De-normalized
AdvantagesEasier to understand and navigateBetter performance minimizes number of joinsSupports multi-dimensional analysisExtensible design supports changing business requirementsAllows relative easy maintenanceRecommended for most Decision Support Systems
DrawbacksMay lead to multiple dimension tables
Data Modeling Concepts
*SNOWFLAKE Schema (Example)Data Modeling Concepts
Fact_SalesTime (PK)Product (PK)Geography (PK)Customer (PK)Unit_SalesPriceSales_Amount
Dim_CustomerCustomer (PK)Cust_NameCust_PhoneEmail
Dim_ProductProduct (PK)Prod_NameProd_DescCategory
Dim_TimeTime (PK)DayMonth
Dim_CountryCountry (PK)
Dim_MthMonth (PK)Quarter
Dim_QtrQuarter (PK)Year
Dim_YearYear (PK)
Dim_StateState (PK)Country
Dim_CityCity (PK)State
Dim_GeographyGeography (PK)BranchCity
* SNOWFLAKE Schema contd.Fact Tables are Normalized, Dimension Tables are Normalized
AdvantagesAvoids redundancy and saves storageShould improve understanding and overall performanceQuick response time when queries involve aggregation
DrawbacksComplex queries and more foreign key joinsComplicated maintenanceExplosion in the number of tables in the database
Data Modeling Concepts
* Relational Vs. Dimensional ModelingDecision to go for OLTP or Data Warehouse is determined by the business needs of the organizationData Modeling Concepts
Relational ModelingDimensional ModelingData is stored in RDBMS tablesData is stored in RDBMS or MDBs / cubesData is normalized and optimized for OLTPData is de-normalized and optimized for OLAP, DWHEntity Driven, Transaction PerformanceData Driven, Query PerformanceLess indexedHighly indexedVolatile (many updates) and time variantNon-volatile and usually time invariantDetailed level of segregated transaction dataAggregated data and measures usedNormal ReportsDrag and Drop multidimensional OLAP reports
* ERwin Database Design and Modeling ToolEffective case tool for Logical / Physical data modeling
Supports Dimensional Modeling
Entity, Attributes and Relationships can be easily defined
Erwin talks to the back-end database Reverse Engineering and Forward Engineering
Subject areas to facilitate the view of data marts and merging them into the Enterprise Wide Data Warehouse (EDW)
Reports - Standard set of reports provided by ErwinData Modeling Concepts
Things to avoid in a Data Modeling
Vague Purpose Dont build a model without understanding the business rationale. The purpose for a model dictates the level of detail (just entities and relationships, fully attributed, with data types and full constraints). Literal Modeling Data modeling cannot be done literally only with Customer inputs. We need to capture and solve the problem that the customer is imperfectly describing. We need to pay attention to the hidden true requirements. You must interpret and abstract what the customer tells you. *
Large Size As a general rule, a model to be no more than 200 tables. The reason is that large models involve more work. Need to simplify with high level of abstraction.Create Subject Areas for better readability and maintenance.Speculative Content At least 90 percent of a model should pertain to immediate needs. As much as 10 percent can anticipate future needs. Otherwise you run the risk of scope creep .
*Things to avoid in a Data Modeling
Lack of Clarity Normally a model should not be made difficult to understand for humans. This can be achieved by using DOMAINS , UDPs etc.Violation of Normal Forms An operational application concerns the routine operations of a business.An analytical applications emphasize complex queries that read large quantities of dataDo not violate normal forms, except for analytical applications and performance bottlenecks.*Things to avoid in a Data Modeling
Needless RedundancyIdeally a database should have a single recording of each data item.Dont include redundant data in an attempt to compensate for a poorly conceived application.Parallel AttributesParallel attributes are acceptable for a data warehouse and are often used in dimensions to simplify queries.*Things to avoid in a Data Modeling
Anonymous FieldsAs much as possible, you should clearly describe the data being stored and not use anonymous fields.a location table with anonymous fields. To find a city, you must search multiple fields.like Addr1 , Addr2 anad Addr3 where any info can be kept.It would be much better to put address information in distinct fields that are clearly named.*Things to avoid in a Data Modeling
Data Quality through Data ModelingData quality is the absence of intolerable defects. It is not the absence of defects. All data quality defects fall into a set of 9 broad buckets: 1. Lacking integrity of reference between data values across the model2. Entities lack unique identification3. Unreasonable values4. Attributes are used for multiple meanings5. Inconsistent formatting6. Incorrect data7. Missing data8. Miscalculations9. Data that falls outside of its intended codification*
The major data modeling constructs relevant to data entry which relate to the data quality defect categories are:1. Uniqueness the enforcement that a column will have unique values in it2. Check guarantees that a columns value will fall in a predefined range or list3. Key forcing the integrity of desired references across entities like the existence of the customer before he places an order4. Mandatory forces a true value to be entered into a column5. Default setting a value when none is entered6. Null allowing null (no value) to be used in a column instead of forcing a valueDefaults and null constraints are usually more `problematic to data quality when used than when not used because they allow for an abstract value (or null, which is no value) to be used in place of a customized, relevant value.*
Popular Data Modeling Tools*All the above tools support Forward,Reverse Engineering and ERD.
Tool NameCompany NameERWinComputer AssociatesEmbarcaderoEmbarcadero TechnologiesPower DesignerSybase CorpOracle DesignerOracle CorpRational RoseIBM
Q&A
Contact me for any queries A.Rajesh Kumar [email protected] No : 4918 1060 *
Thanks!*
**********************