+ All Categories
Home > Documents > Www.ocsinfotech.in OCS Infotech Proprietary & Confidential Typical BI solution Architecture.

Www.ocsinfotech.in OCS Infotech Proprietary & Confidential Typical BI solution Architecture.

Date post: 17-Dec-2015
Category:
Upload: willa-french
View: 218 times
Download: 0 times
Share this document with a friend
25
www.ocsinfotech.in OCS Infotech Proprietary & Confidential Typical BI solution Architecture
Transcript

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Typical BI solution Architecture

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Typical BI Solution Architecture

Admin & Finance

Applications

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Typical BI Solution Server Topology

SharePoint 2010SQL Reporting

Services

SQL Analysis Services – Query

SQL Relational DM

Virtual Servers

VMWare Platform

Physical Servers or

devices

Users

1.

2.1

3.2SQL Relational

DB ServicesSQL Integration

Services

4.

Load Balancer

SharePoint 2010SQL Reporting

Services

SQL Analysis Services – Query

SQL Relational DM

SQL Analysis Services - Process

2.2

3.1

3.3

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

DW Design Methodology

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

DW Design Methodology

1. Choosing the process2. Choosing the grain3. Identifying and confirming the dimensions4. Choosing the facts5. Storing pre-calculations in the fact table6. Rounding out the dimension tables7. Choosing the duration of the database8. Tracking slowly changing dimensions9. Deciding the query priorities and the query modes

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

9 Steps DW Design Methodology

Step 1: Choosing Process – The chosen process (function) refers to the subject matter of a

particular data mart, for example: a Bill Payment Process Step 2: Choosing The Grain

– Decide what a record of the fact table is to represent, i.e.. the grain. For example, the grain is a single Payment

Step 3: Identifying and conforming the dimensions– Dimensions set the context for asking questions about the facts in

the fact table. e.g. Who made the Bill Payment Step 4: Choosing the Facts

– Facts should be numeric and additive.

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

9 Steps DW Design Methodology

Step 5: Storing pre-calculations in the fact table – Once the facts have been selected each should be re-examined to

determine whether there are opportunities to use pre-calculations. (denormalization)

Step 6: Rounding out the dimension tables – What properties to include in dimension table to best describe it.

Should be intuitive and understandable Step 7: Choosing the duration of the database

– How long to keep the data for

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

9 Steps DW Design Methodology

Step 8: Tracking slowly changing dimensions – Type 1: where a changed dimension attribute is overwritten– Type 2: where a changed dimension attribute causes a new dimension

record to be created – Type 3: where a changed dimension attribute causes an alternate

attribute to be created so that both the old and new values of the attribute are simultaneously accessible in the same dimension record

Step 9: Deciding the query priorities and the query modes – Consider physical decision issues

• Indexing for performance, Indexed Views, partitioning, physical sort order, etc.

• Storage, backup, security

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Data Warehouse Architecture

Data Warehouse ArchitectureSource Systems

Div

isio

n A

Div

isio

n B

Div

isio

n C

Ext

erna

l Dat

a

Data Staging Area

Data Warehouse Repository

Ext

ract

, Tra

nsfo

rmat

ion

and

Load

(E

TL)

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Data Staging Area

• Subject Area Oriented

• Data Structure more closely mirrors Operational System Data Layouts

• Supports Identification of Changed Data

• Acts as a Working Area to Support the Transformation Process

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Data Warehouse Repository

• Organized around Conformed Dimensions and Facts

• Promotes Usability and Intuitiveness

• Consolidated and Cross-Functional

• Historical and Atomic Representation of Data •Insulated from Source System Modifications and Additions

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Typical Star Schema

Time_DimTime_DimTimeKeyTimeKeyTheDate...

TheDate...

Sales_FactSales_FactTimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey

TimeKeyEmployeeKeyProductKeyCustomerKeyShipperKeyRequired Data(Business Metrics) or (Measures)...

Required Data(Business Metrics) or (Measures)...

Employee_DimEmployee_DimEmployeeKeyEmployeeKeyEmployeeID...

EmployeeID...

Product_DimProduct_DimProductKeyProductKeyProductID...

ProductID...

Customer_DimCustomer_DimCustomerKeyCustomerKeyCustomerID...

CustomerID...

Shipper_DimShipper_DimShipperKeyShipperKeyShipperID...

ShipperID...

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Typical Snow Flake Schema

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

SSIS Design Best Practices

SSIS Design Best Practices

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

SSIS Best Practices

SSIS (SQL Server Integration Services) is the ETL tool from Microsoft BI stack, and we follow Microsoft recommended design best practices. Few of them are listed below

• Making the sure the design of staging schema(s) and target DW schema is ready and data mapping document is prepared

• Necessary table design best practices to handle high volumes of data.

• Using Column names instead ‘*’ to utilize optimal usage of package buffer size which improves performance of the package

• Setting the right values for ‘Rows per batch’ and ‘Maximum insert commit size’ whilst loading high volumes of data

• Avoiding asynchronous transformations wherever necessary to achieve the maximum benefit of parallel execution tree mechanism of SSIS

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

SSIS Best Practices

• Enable Logging to identify the issues during run time

• Incorporating proper error handling mechanism for both dimension and fact loads separately

• Usage of ‘Checkpoints’ in the package to resume the execution from the same state of failure in case it happens

• Auditing of the packages / Data flow tasks to track the result of package execution and other important attributes

• Appropriately using send mail tasks to communicate the package execution status to primary stakeholders

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

SSIS Best Practices

• Using ‘SCD (Slowly Changing Dimensions)’ transformation whilst creating dimension packages to handle the constant changes in the master tables and their maintenance

• Using alternative mechanism instead of ‘Look up’ transformation whenever the data volume is high to increase the performance

• Using Appropriate incremental loading mechanism for incremental updates to fact tables

• Creating the master package, to load all the dimensions in parallel, and sequencing them if necessary to take care of any dependencies and then triggering the fact load packages

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

SSAS Design Best Practices

SSAS Design Best Practices

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

SSAS Best Practices

• Design Best Practices for Cube Dimensions

• Consolidating multiple hierarchies into single dimension• Avoid ROLAP Storage mode• Use of Parent-Child and many-many relationship dimensions prudently

• Design Best Practices for Attributes / Hierarchies

• Define all possible attribute relationships• Remove redundant attribute relationships• Use natural hierarchies where possible

• Design Best Practices for Measures

• Use smallest numeric data type possible • Use semi-additive aggregate functions instead of MDX calculations to

achieve same behavior

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

SSAS Best Practices

• Creating Partitions if there are more than 20 M rows in the fact table (approx.)

• Managing storage settings (MOLAP, HOLAP, ROLAP) by usage patterns

• Designing aggregations appropriately by accurately giving the counts for attributes and fact table counts

• Writing optimized MDX queries

• Learn the data security that the client expects and implement the security • Data level• Role based

• Data Mining – Create Mining Structures and add mining models to it. Different models possible are as follows

Classify – decision trees, Neural networks Forecasting – time series analysis Shopping basket – shopping basket analysis (generally used in retail to identify

which commodities are purchased as a group)

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Typical OLAP Cube Design

Apples

CherriesGrapes

Melons

Q4Q1 Q2 Q3

Time Dimension

Dallas

Denver

Chicago

Ma

rke

ts D

ime

nsi

on

Atlanta

Produ

ct D

imen

sion

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Typical OLAP Cube Design

Q4

CherriesGrapes

Melons

Q1 Q2 Q3

Time Dimension

Dallas

Denver

Chicago

Ma

rke

ts D

ime

nsi

on

Atlanta

Produ

ct D

imen

sion

Sales Fact

Apples

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Storage Methodology

• Relational On-Line Analytical Processing (ROLAP): The information that is stored in the Data Warehouse is held in a relational structure. Aggregations are performed on the fly either by the database or in the analysis tool.

• Multidimensional On-Line Analytical Processing (MOLAP): This information is aggregated in a predefined manner based on the characteristics of the Measures and the defined hierarchy of the Dimensions. Since the data is pre-aggregated, navigating through the hierarchies is instantaneous. The user is simply navigating to a point within the Multidimensional Cube and not performing any on the fly aggregations.

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Storage Methodology

Hybrid On-Line Analytical Processing (HOLAP): This is a combination of MOLAP and ROLAP. A portion of the data is predefined and aggregated. This would typically be the set of information that is accessed most frequently. Additional detail can be held in a ROLAP structure and allow a user to drill through the MOLAP structure into the ROLAP structure.

www.ocsinfotech.in OCS Infotech Proprietary & Confidential

Storage Methodology Selection Criteria

Client perspectiveClient perspective MOLAPMOLAP HOLAPHOLAP ROLAPROLAP

Query performanceQuery performance

Storage consumptionStorage consumption

FastestFastest

HighHigh

FasterFaster

MediumMedium

FastFast

LowLow


Recommended