+ All Categories
Home > Documents > ITEC 423 Data Warehousing and Data Mining Lecture 3.

ITEC 423 Data Warehousing and Data Mining Lecture 3.

Date post: 16-Dec-2015
Category:
Upload: stewart-hall
View: 220 times
Download: 1 times
Share this document with a friend
Popular Tags:
28
ITEC 423 Data Warehousing and ITEC 423 Data Warehousing and Data Mining Data Mining Lecture 3 Lecture 3
Transcript
Page 1: ITEC 423 Data Warehousing and Data Mining Lecture 3.

ITEC 423 Data Warehousing and ITEC 423 Data Warehousing and Data MiningData MiningLecture 3Lecture 3

Page 2: ITEC 423 Data Warehousing and Data Mining Lecture 3.

ArchitectureArchitecture

Architecture is the art and science of designing buildings and other structures;

Architecture is as a system design decision that is usually not easily changed.

There are many different architectural choices available with different solutions for Data transfer Data Staging Area Data storage Information Delivery

Page 3: ITEC 423 Data Warehousing and Data Mining Lecture 3.

A General Data Warehouse A General Data Warehouse ArchitectureArchitecture

Page 4: ITEC 423 Data Warehousing and Data Mining Lecture 3.

A General Data Warehouse A General Data Warehouse ArchitectureArchitecturewith Staging Areawith Staging Area

Page 5: ITEC 423 Data Warehousing and Data Mining Lecture 3.

A General Data Warehouse A General Data Warehouse ArchitectureArchitecturewith Staging Area and Data with Staging Area and Data MartsMarts

Page 6: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Architectural TypesArchitectural Types

Page 7: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Architectural Types :Centralized Architectural Types :Centralized Data WarehouseData Warehouse

Takes into account the enterprise-level information requirements

Atomic level data at the lowest level of granularity is stored

Some summarized data may be included

Queries and applications access the central data warehouse.

No separate data marts

Page 8: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Architectural Types- Architectural Types- Independent Data MartsIndependent Data Marts

Evolves in companies where the organizational units develop their own data marts for their own specific purposes

Each data mart serves a particular organizational unit

More than one version of the truth may be found

Data marts are independent of one another Different data marts may have inconsistent

data definitions and standards Such variances hinder analysis of data

across data marts.

Page 9: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Architectural Types-Architectural Types-FederatedFederated

An existing legacy of an assortment of DSS in the form of operational systems, extracted datasets, primitive data marts, …

May not be possible to discard investment and start from scratch

Practical solution is a federated architectural type

data may be physically or logically integrated through shared key fields, overall global metadata , distributed queries, and such other methods

No one overall data warehouse

Page 10: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Architectural Types-Architectural Types- Data-Data-Mart BusMart Bus

Conformed supermarts approach Analyzing requirements for a specific business subject such as

orders, shipments, billings, insurance claims, car rentals ... Build the first data mart (supermart) using business dimensions

and metrics These business dimensions will be shared in the future data

marts. Conform dimensions among the various data marts Result would be logically integrated supermarts that will

provide an enterprise view of the data Data marts contain atomic data organized as a dimensional

data model Results from adopting an enhanced bottom-up approach to

data warehouse development

Page 11: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Architectural Types- Hub Architectural Types- Hub and Spokeand Spoke

Similar to the centralized data warehouse architecture: enterprise-wide data warehouse

Atomic data is stored in the centralized data warehouse The centralized data warehouse feeds data to the

dependent data marts on the spokes Dependent data marts may be developed for departmental

analytical needs, specialized queries, data mining ... Dependent data mart may have normalized, denormalized,

summarized, or dimensional data structures based on individual requirements

Most queries are directed to the dependent data marts Centralized data warehouse may also be used for querying

Results from adopting a top-down approach to data warehouse development.

Page 12: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Building Blocks of Data Building Blocks of Data WarehousesWarehouses

Page 13: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Production

Internal

Archived

External

Source

Data

Page 14: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Production DataProduction Data

Page 15: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Internal DataInternal Data

Page 16: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Archived DataArchived Data

Backup data

old data of the operational databases are stored in archived files.

Decisions related to archiving

how often

which portions

Different methods of archiving

Recent data archived to a separate archival database that may still be online.

Older data archived to flat files on disk storage.

Oldest data archived to tape cartridges or microfilm may be kept off-site.

Data warehouse keeps historical snapshots of data.

need historical data for analysis over time.

Look into your archived data sets. Depending on your data warehouse requirements, you have to include sufficient historical data.

Page 17: ITEC 423 Data Warehousing and Data Mining Lecture 3.

External DataExternal DataExternal data is used especially by decision makers

statistics relating to their industry produced by external agencies and national statistical offices.

market share data of competitors.

standard values of financial indicators for their business to check on their performance.

Production data and archived data

give you a picture based on what you are doing or have done in the past.

Is not enough for understanding industry trends and compare performance

Page 18: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Extraction

Transformation

Loading

Data Staging Compon

ent

Page 19: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Data ExtractionData Extraction

• Source data may be from different source machines in diverse data formats.

Deals with numerous data sources

• Outside tools suitable for certain data sources• Develop in-house programs to do the data extraction.

Tools are available on the market for data extraction.

After extraction where to keep the data for further preparation?

Perform the extraction function in the legacy system

• extract the source data into a • group of flat files• a data -staging relational data base• a combination of both.

Extract the source into a separate physical environment from which moving the data into the data warehouse would be easier.

Page 20: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Data TransformationData Transformation

Data for a data warehouse comes from many disparate sources

• Clean the data from each source: misspellings, resolution, missing data, duplicates

• Standardize data elements: data types, lengths, synonyms/homonyms

• Combine related information

• Purge useless data• Choose appropriate keys• Summarize if necessary

Data feed is not just an initial load.

• Same (maybe slightly adapted) transformation process will be applied periodically.

Page 21: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Data LoadingData Loading

Page 22: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Data Storage ComponentData Storage Component

A separate repository

Large volumes of historical data for analysis

not for quick retrieval of individual pieces of information

multidimensional databases store data aggregated at different levels

Page 23: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Information Delivery Information Delivery ComponentComponent

novice user

need prefabricated reports and preset queries

casual user

need prepackaged information once in a while

business analyst

need ability to do complex analysis using the information in the data warehouse

power user

need to be able to navigate throughout the data warehouse, pick up interesting data, format his or her own queries, drill through the data layers, and create custom reports and ad hoc queries.

Ad hoc reports

complex queries, multidimensional analysis, and statistical analysis

Page 24: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Information Delivery Information Delivery ComponentComponent

Page 25: ITEC 423 Data Warehousing and Data Mining Lecture 3.

•knowledge discovery systems where the mining algorithms help to discover trends and patterns from the data

•online queries and reports•scheduled reports through e-mail or intranet • information delivery over the Internet

Information Delivery Information Delivery ComponentComponentInformation fed into executive information systems (EIS) is meant for senior executives and high-level managers.

Some data warehouses also provide data to data mining applications.

In your data warehouse , you may include several information delivery mechanisms.

Page 26: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Metadata ComponentMetadata ComponentSimilar to the data dictionary or the data catalog in a DBMS

Data about the data in the data warehouse.

key architectural component of the data warehouse.

•Operational metadata•Extraction and transformation metadata•End-user metadata

Types of Metadata:

•connects all parts of the data warehouse .•provides information about the contents and structures to the developers.

•makes the contents recognizable to the end users.

Importance of Metadata

Page 27: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Management and Control Management and Control ComponentComponent

sits on top of all the other components.

coordinates the services and

activities within the data warehouse.

controls the data transformation and the data transfer

into the data warehouse storage.

moderates the information delivery

to the users.

works with the database

management systems and enables data to be properly

stored in the repositories.

monitors the movement of data

into the staging area and from there into the data warehouse

storage itself.

interacts with the metadata

component to perform the

management and control functions.

Page 28: ITEC 423 Data Warehousing and Data Mining Lecture 3.

Recommended