+ All Categories
Home > Career > Data Warehousing and BI - Recruitment POV

Data Warehousing and BI - Recruitment POV

Date post: 04-Dec-2014
Category:
Upload: suvradeep-rudracsm
View: 507 times
Download: 0 times
Share this document with a friend
Description:
DW/BI understanding for a recruiter.
Popular Tags:
36
Growth is Life – Dhirubhai Ambani Build :: Balance
Transcript
Page 1: Data Warehousing and BI - Recruitment POV

Growth is Life – Dhirubhai Ambani

Build :: Balance

Page 2: Data Warehousing and BI - Recruitment POV

AGENDA Data warehouse and BI overview Data warehouse Data Flow Staging Area Transformation Loading ETL tools Data Marts Business Intelligence (BI) OLAP BIG DATA

Page 3: Data Warehousing and BI - Recruitment POV

DATA WAREHOUSE AND BI OVERVIEW

• A data warehouse is a database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

• In addition to a relational database, a data warehouse environment includes an extraction, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

• Business intelligence (BI) is defined as the ability for an organization to take all its capabilities and convert them into knowledge. This produces large amounts of information which can lead to the development of new opportunities for the organization.

Page 4: Data Warehousing and BI - Recruitment POV

FEW KEY IMPORTANT WORDS • Business Operation

• Business Intelligence

• Business Management

• Operational System

• Data Warehouse

• Operational Data store

• Data Mart

• Meta Data Management

Page 5: Data Warehousing and BI - Recruitment POV

STEPS TO CREATE A DATAWAREHOUSE• Understand the business problem to be solved

• Gather requirements

• Determine appropriate end user technology to support the solution

• Build a prototype

• Develop data warehouse data model

• Map the DW requirements based on the user’s requirement definitions

• Generate ETL code

• Test the DW

• Once validate, move the data and code to Production

Page 6: Data Warehousing and BI - Recruitment POV

SUBJECT

• Referred as subject oriented data warehouse

• Subject refers to data subject or major category of data relevant to business.

• Subset of enterprise data and consist of related entities and relationship.

• Examples Customers,Products,Sales,Geo

Page 7: Data Warehousing and BI - Recruitment POV

ENTITY• Defined as person ,place, thing concept or relevant in which an enterprise has both

interest and capability to capture and store information

• Primary entity – defined as an entity that does not depend on any other entity for its existance

• SUBTYPE Entity – is logical division of or category of a parent (super type) entity. Examples – Customers can be Wholesale customers and Retail customers. Both inherits parent attributes of parent entity.

• Attribute - It handles a group of data for an entity that can occur multiple times.

• Associative Entity - it depends upon 2 or more entities for its existence . Like Orders consists of Customer and Items purchased.

• Primary Key – Servers as unique identifier for an Entity and is used in the physical database to locate a record for storage or access

Page 8: Data Warehousing and BI - Recruitment POV

CHARACTERISTICS OF A PRIMARY KEY (PK)

• The key is never NULL

• The key is unique and unique by design and not by circumstances

• The key is persistence over the time

• The key is manageable – consists of integers and characters strings and no embedded symbols or odd characters

• The key should not contain any embedded intelligence

Page 9: Data Warehousing and BI - Recruitment POV

RELATIONSHIP• Relationship documents the business rules associating two entities together. The relationship is used to

describe how the two entries are naturally linked to each other.

• Example Customers can place orders.

• Cardinality *** - denotes the maximum number of occurrence of one entity to another that can relate to another entity. Usually these are expressed as “ONE” or “MANY”

• Identifying Relationship – An identifying relationship means that the child table cannot be uniquely identified without the parent

• Example... Account (AccountID, AccountNum, AccountTypeID) PersonAccount (AccountID, PersonID, Balance) Person(PersonID, Name)

• The Account to PersonAccount relationship and the Person to PersonAccount relationship are identifying because the child row (PersonAccount) cannot exist without having been defined in the parent (Account or Person). In other words: there is no personaccount when there is no Person or when there is no Account.

• NON Identifying relationship - A non-identifying relationship is one where the child can be identified independently of the parent

• Example... Account( AccountID, AccountNum, AccountTypeID ) AccountType( AccountTypeID, Code, Name, Description )

• The relationship between Account and AccountType is non-identifying because each AccountType can be identified without having to exist in the parent table.

Page 10: Data Warehousing and BI - Recruitment POV

NORMALIZATION• Normalization is the process of efficiently organizing data in a database. There are two

goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one t) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

• The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through three (third normal form or 3NF).

Page 11: Data Warehousing and BI - Recruitment POV

FIRST NORMAL FORM (1NF)

• Eliminate duplicative columns from the same table.

• Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

• The first rule dictates that we must not duplicate data within the same row of a table. Within the database community, this concept is referred to as the atomicity of a table. Tables that comply with this rule are said to be atomic.

• Let’s explore this principle with a classic example – a table within a human resources database that stores the manager-subordinate relationship. For the purposes of our example, we’ll impose the business rule that each manager may have one or more subordinates while each subordinate may have only one manager.

Page 12: Data Warehousing and BI - Recruitment POV

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Option 1: Make a determinant of the repeating group (or the multivalued attribute) a part of the primary key.

Composite Primary Key

Page 13: Data Warehousing and BI - Recruitment POV

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Option 1: Make a determinant of the repeating group (or the multivalued attribute) a part of the primary key.Composite

Primary Key

Page 14: Data Warehousing and BI - Recruitment POV

Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original relation and the determinant of the repeating group will comprise a primary key. STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Page 15: Data Warehousing and BI - Recruitment POV

STUDENT_COURSE

Stud_ID Course Units

101 MSI 250 3

101 MSI 415 3

125 MSI 331 3

STUDENT

Stud_ID Name

101 Lennon

125 Jonson

Page 16: Data Warehousing and BI - Recruitment POV

SECOND NORMAL FORM (2NF)

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

Composite Primary Key

• Goal: Remove Partial Dependencies

Partial Dependencies

Page 17: Data Warehousing and BI - Recruitment POV

CUSTOMER

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

125 Johnson MSI 331 3.00

STUDENT_COURSE

Stud_ID Course_ID

101 MSI 250

101 MSI 415

125 MSI 331

COURSE

Course_ID Units

MSI 250 3.00

MSI 415 3.00

MSI 331 3.00

STUDENT

Stud_ID Name

101 Lennon

101 Lennon

125 Johnson

Page 18: Data Warehousing and BI - Recruitment POV

THIRD NORMAL FORM (3NF)

• Goal: Get rid of transitive dependencies.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

Transitive Dependency

Page 19: Data Warehousing and BI - Recruitment POV

THIRD NORMAL FORM (3NF)• Remove the attributes, which are dependent on a non-key

attribute, from the original relation. For each transitive dependency, create a new relation with the non-key attribute which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

Page 20: Data Warehousing and BI - Recruitment POV

THIRD NORMAL FORM (3NF)

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID

111 Mary Jones 1

122 Sarah Smith 2

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg

DEPARTMENT

Dept_ID Dept_Name

1 Acct

2 Mktg

Page 21: Data Warehousing and BI - Recruitment POV

ZACHMAN FRAMEWORK FOR ENTERPRISE ARCHITECTURES

Page 22: Data Warehousing and BI - Recruitment POV

ZACHMAN FRAMEWORK FOR ENTERPRISE ARCHITECTURES

• As you can see from Figure 4, there are 36 intersecting cells in a Zachman grid—one for each meeting point between a player's perspective (for example, business owner) and a descriptive focus (for example, data.). As we move horizontally (for example, left to right) in the grid, we see different descriptions of the system—all from the same player's perspective. As we move vertically in the grid (for example, top to bottom), we see a single focus, but change the player from whose perspective we are viewing that focus.

• The first suggestion of the Zachman taxonomy is that every architectural artifact should live in one and only one cell. There should be no ambiguity about where a particular artifact lives. If it is not clear in which cell a particular artifact lives, there is most likely a problem with the artifact itself.

• The second suggestion of the Zachman taxonomy is that an architecture can be considered a complete architecture only when every cell in that architecture is complete. A cell is complete when it contains sufficient artifacts to fully define the system for one specific player looking at one specific descriptive focus.

• The third suggestion of the Zachman grid is that cells in columns should be related to each other. Consider, for example, the data column (the first column) of the Zachman grid. From the business owner's (Bret's) perspective, data is information about the business. From the database administrator's perspective, data is rows and columns in the database.

Page 23: Data Warehousing and BI - Recruitment POV

ZACHMAN GRID

5 ways in which the Zachman grid can help in the development of a enterprise architecture

• Ensure that every stakeholder's perspective has been considered for every descriptive focal point.

• Improve the client’s artifacts themselves by sharpening each of their focus points to one particular concern for one particular audience.

• Ensure that all of client’sbusiness requirements can be traced down to some technical implementation.

• Convince client’s technical team isn't planning on building a bunch of useless functionality.

• Convince Client that the business folks are including her IT folks in their planning.

Page 24: Data Warehousing and BI - Recruitment POV

THE OPEN GROUP ARCHITECTURE FRAMEWORK (TOGAF)

• TOGAF is the Architecture Development Method

• TOGAF divides an enterprise architecture into four categories, as follows

• Business architecture—Describes the processes the business uses to meet its goals

• Application architecture—Describes how specific applications are designed and how they interact with each other

• Data architecture—Describes how the enterprise datastores are organized and accessed

• Technical architecture—Describes the hardware and software infrastructure that supports applications and their interactions

• Zachman tells you how to categorize your artifacts. TOGAF gives you a process for creating them.

Page 25: Data Warehousing and BI - Recruitment POV

DAY-TO-DAY EXPERIENCE OF CREATING AN ENTERPRISE ARCHITECTURE WILL BE DRIVEN BY THE ADM

A high-level view 

Page 26: Data Warehousing and BI - Recruitment POV

PHASE A & PHASE B• The culmination of Phase A will be a Statement of Architecture Work, which must be

approved by the various stakeholders before the next phase of the ADM begins. The output of this phase is to create an architectural vision for the first pass through the ADM cycle. Architect will guide Client into choosing the project, validating the project against the architectural principles established in the Preliminary Phase, and ensure that the appropriate stakeholders have been identified and their issues have been addressed.

• The Architectural Vision created in Phase A will be the main input into Phase B. Client’s goal in Phase B is to create a detailed baseline and target business architecture and perform a full analysis of the gaps between them.

• Phase B is quite involved—involving business modeling, highly detailed business analysis, and technical-requirements documentation. A successful Phase B requires input from many stakeholders. The major outputs will be a detailed description of the baseline and target business objectives, and gap descriptions of the business architecture.

Page 27: Data Warehousing and BI - Recruitment POV

PHASE C• Develop baseline data-architecture description

• Review and validate principles, reference models, viewpoints, and tools

• Create architecture models, including logical data models, data-management process models, and relationship models that map business functions to CRUD (Create, Read, Update, Delete) data operations

• Select data-architecture building blocks

• Conduct formal checkpoint reviews of the architecture model and building blocks with stakeholders

• Review qualitative criteria (for example, performance, reliability, security, integrity)

• Complete data architecture

• Conduct checkpoint/impact analysis

• Perform gap analysis

• The most important deliverable from this phase will be the Target Information and Applications Architecture.

Page 28: Data Warehousing and BI - Recruitment POV

PHASE D & PHASE E• Phase D completes the technical architecture—the infrastructure necessary to support

the proposed new architecture. This phase is completed mostly by engaging with Client’s infrastructure and technical team.

• Phase E evaluates the various implementation possibilities, identifies the major implementation projects that might be undertaken, and evaluates the business opportunity associated with each. The TOGAF standard recommends that Client’s first pass at Phase E "focus on projects that will deliver short-term payoffs and so create an impetus for proceeding with longer-term projects.“

• A good starting place to look for such projects is the organizational pain-points that initially convinced by client’s CEO to adopt an enterprise architectural-based strategy

Page 29: Data Warehousing and BI - Recruitment POV

PHASE F , PHASE G & PHASE H• Phase F is closely related to Phase E. In this phase, Teri works with MedAMore's

governance body to sort the projects identified in Phase E into priority order that include not only the cost and benefits (identified in Phase E), but also the risk factors

• In Phase G, Client takes the prioritized list of projects and creates architectural specifications for the implementation projects. These specifications will include acceptance criteria and lists of risks and issues

• The final phase is H. In this phase, Client modifies the architectural change-management process with any new artifacts created in this last iteration and with new information that becomes available

Page 30: Data Warehousing and BI - Recruitment POV

INFORMATION ENGINEERING (IE) NOTATION

Page 31: Data Warehousing and BI - Recruitment POV
Page 32: Data Warehousing and BI - Recruitment POV
Page 33: Data Warehousing and BI - Recruitment POV
Page 34: Data Warehousing and BI - Recruitment POV
Page 35: Data Warehousing and BI - Recruitment POV
Page 36: Data Warehousing and BI - Recruitment POV

Recommended