7/29/2019 DW Lecture 05
1/22
Lecture 05
Tue, Feb 17, 2009 1800 : 2100
FAST NU, Karachi
7/29/2019 DW Lecture 05
2/22
2
Architectural Components 3 Major Areas
Data Acquisition Extraction, Transformation, Cleansing, Integration, Staging
Data Storage Loading, Archiving, Management
Information Delivery Reports, Query Processing, Complex Analysis
Building Blocks of the Data Warehouse Source Data
Data Staging Data Storage Information Delivery Metadata Management and Control
7/29/2019 DW Lecture 05
3/22
3
Architectural Components
Source Data
Data Staging
Data Storage
Metadata
Management & ControlInformation Delivery
Reports / Queries
OLAP
Data Mining
Data Marts
Data Warehouse MDDB
DATAACQUISITION
DATASTORA
GE
INFORMAT
IONDELIVERY
External
Production
Internal
A
rchived
7/29/2019 DW Lecture 05
4/22
4
Infrastructure Supporting
Architecture Operational
People
Procedures
Training
Management Software
Physical
Hardware Operating System
DBMS
Network Software
7/29/2019 DW Lecture 05
5/22
5
Platform Options Single Platform Option Hybrid Option
Source Data Platforms Staging Area Platforms
Options for Staging Area Source Data Platforms Data Storage Platforms Separate Platforms
Data Movement Options Shared Disk Mass Data Transmission Real Time Connection Manual Methods
7/29/2019 DW Lecture 05
6/22
6
Server Hardware SMP (Symmetric Multiprocessing)
Clusters
MPP (Massively Parallel Processing) ccNuma or NUMA (Cache-coherent Nonuniform
Memory Architecture)
7/29/2019 DW Lecture 05
7/227
Symmetric Multiprocessing Features
Shared everything architecture Simplest parallel processing
Benefits
Proven technology since 1970 Workload balance Scalable performance Easy administration
Limitations Limited available memory
Limited bandwidth Limited availability
Consideration Data warehouse size is two to three hundred gigabytes and concurrency
requirements are reasonable
7/29/2019 DW Lecture 05
8/228
Symmetric MultiprocessingProcessor Processor Processor Processor
Shared Disks
Shared Memory
Common Bus
7/29/2019 DW Lecture 05
9/229
Clusters Features
Each node has one or more processors and associated memory Memory is shared within each node only High speed bus communication
Shared disks Cluster of nodes
Benefits High availability Preserves the concept of one database Incremental growth
Limitations Bus bandwidth High O/S overhead Cache consistency maintenance for inter-node synchronization
Consideration If data warehouse is expected to grow in a well defined increments
7/29/2019 DW Lecture 05
10/2210
ClustersProcessor Processor
Shared Disks
SharedMemory
Common High Speed Bus
Processor Processor
SharedMemory
7/29/2019 DW Lecture 05
11/2211
Massively Parallel Processing Features
Shared nothing architecture Focus of disk access than memory access Works well with O/S that supports transparent disk access Inter-node communication through processor to processor connection
Benefits Highly scalable Fast access between nodes Improved system availability Cost per node is low
Limitations Requires rigid data partitioning Restricted data access Limited work load balance Cache consistency must be maintained
Considerations Medium to large size data warehouse of four to five hundred gigabytes
7/29/2019 DW Lecture 05
12/22
12
Massively Parallel ProcessingProcessor
Memory
Disk
Processor
Memory
Disk
Processor
Memory
Disk
Processor
Memory
Disk
7/29/2019 DW Lecture 05
13/22
13
Cache-coherent Nonuniform
Memory Architecture Features
New architecture, since early 1990s Big SMP broken into smaller SMP Single real memory address space over entire machine
Benefits Maximum flexibility Overcome memory limitations of SMP Better scalability than SMP Partitioning with centralized approach
Limitations
Complex programming Limited software support Still maturing
Consideration For experienced technology users
7/29/2019 DW Lecture 05
14/22
14
Cache-coherent Nonuniform
Memory ArchitectureProcessor Processor
SharedMemory
Disks
ProcessorProcessor
SharedMemory
Disks
7/29/2019 DW Lecture 05
15/22
15
Software Tools Data Modeling
Data Extraction
Data Transformation
Data Loading
Data Quality
Queries and Reports
OLAPAlert Systems
Middleware and Connectivity
Data Warehouse Management
Architecture First,Then Tools
7/29/2019 DW Lecture 05
16/22
16
Metadata Definitions
Data about data
Table of contents for the data Catalog for the data
Data warehouse atlas
Data warehouse roadmap
Data warehouse directory The nerve center
7/29/2019 DW Lecture 05
17/22
17
Metadata
ExampleEntity Name CustomerAlias Names Account, Client
Definition A person or an organization that purchases good or services
from the company
Remarks It includes regular, current and past customersSource Systems Finished Goods Orders, Maintenance Contracts, Online Sales
Created Date January 15, 1999
Last Update Date January 21, 2001
Update Cycle Weekly
Last Full Refresh December 29, 2000Full Refresh Cycle Every Six Months
Data Quality Reviewed January 25, 2001
Last Deduplication January 10, 2001
Planned Archival Every Six Months
Responsible User Jane Brown
7/29/2019 DW Lecture 05
18/22
18
Need of Metadata For Using Data Warehouse
For Building Data Warehouse
For Administering Data WarehouseWho needs it?
IT Professionals
Power Users
Casual Users
7/29/2019 DW Lecture 05
19/22
19
A Nerve Center
Data WarehouseMetadata
SourceSystems
ExtractionTools
CleansingTools
Trans-formation
Tools
DataLoad
Function
ExternalData
Applications
DataMining
OLAPTool
ReportingTool
QueryTool
7/29/2019 DW Lecture 05
20/22
20
Metadata by Functional Areas Data Acquisition
Extraction, Transformation, Cleansing, Integration,Staging
Data Storage Loading, Archiving, Management
Information Delivery Reports, Query Processing, Complex Analysis
Business Metadata
Technical Metadata
7/29/2019 DW Lecture 05
21/22
21
Metadata Requirements Capturing and Storing Data
Variety of Metadata Sources
Metadata Integration Metadata Standardization
Rippling through Revisions
Keeping Metadata Synchronized
Metadata Exchange
7/29/2019 DW Lecture 05
22/22
22
Metadata Sources Source Systems
Data Extraction
Data Transformation and Cleansing Data Loading
Data Storage
Information Delivery