Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | stewart-hall |
View: | 220 times |
Download: | 1 times |
ITEC 423 Data Warehousing and ITEC 423 Data Warehousing and Data MiningData MiningLecture 3Lecture 3
ArchitectureArchitecture
Architecture is the art and science of designing buildings and other structures;
Architecture is as a system design decision that is usually not easily changed.
There are many different architectural choices available with different solutions for Data transfer Data Staging Area Data storage Information Delivery
A General Data Warehouse A General Data Warehouse ArchitectureArchitecture
A General Data Warehouse A General Data Warehouse ArchitectureArchitecturewith Staging Areawith Staging Area
A General Data Warehouse A General Data Warehouse ArchitectureArchitecturewith Staging Area and Data with Staging Area and Data MartsMarts
Architectural TypesArchitectural Types
Architectural Types :Centralized Architectural Types :Centralized Data WarehouseData Warehouse
Takes into account the enterprise-level information requirements
Atomic level data at the lowest level of granularity is stored
Some summarized data may be included
Queries and applications access the central data warehouse.
No separate data marts
Architectural Types- Architectural Types- Independent Data MartsIndependent Data Marts
Evolves in companies where the organizational units develop their own data marts for their own specific purposes
Each data mart serves a particular organizational unit
More than one version of the truth may be found
Data marts are independent of one another Different data marts may have inconsistent
data definitions and standards Such variances hinder analysis of data
across data marts.
Architectural Types-Architectural Types-FederatedFederated
An existing legacy of an assortment of DSS in the form of operational systems, extracted datasets, primitive data marts, …
May not be possible to discard investment and start from scratch
Practical solution is a federated architectural type
data may be physically or logically integrated through shared key fields, overall global metadata , distributed queries, and such other methods
No one overall data warehouse
Architectural Types-Architectural Types- Data-Data-Mart BusMart Bus
Conformed supermarts approach Analyzing requirements for a specific business subject such as
orders, shipments, billings, insurance claims, car rentals ... Build the first data mart (supermart) using business dimensions
and metrics These business dimensions will be shared in the future data
marts. Conform dimensions among the various data marts Result would be logically integrated supermarts that will
provide an enterprise view of the data Data marts contain atomic data organized as a dimensional
data model Results from adopting an enhanced bottom-up approach to
data warehouse development
Architectural Types- Hub Architectural Types- Hub and Spokeand Spoke
Similar to the centralized data warehouse architecture: enterprise-wide data warehouse
Atomic data is stored in the centralized data warehouse The centralized data warehouse feeds data to the
dependent data marts on the spokes Dependent data marts may be developed for departmental
analytical needs, specialized queries, data mining ... Dependent data mart may have normalized, denormalized,
summarized, or dimensional data structures based on individual requirements
Most queries are directed to the dependent data marts Centralized data warehouse may also be used for querying
Results from adopting a top-down approach to data warehouse development.
Building Blocks of Data Building Blocks of Data WarehousesWarehouses
Production
Internal
Archived
External
Source
Data
Production DataProduction Data
Internal DataInternal Data
Archived DataArchived Data
Backup data
old data of the operational databases are stored in archived files.
Decisions related to archiving
how often
which portions
Different methods of archiving
Recent data archived to a separate archival database that may still be online.
Older data archived to flat files on disk storage.
Oldest data archived to tape cartridges or microfilm may be kept off-site.
Data warehouse keeps historical snapshots of data.
need historical data for analysis over time.
Look into your archived data sets. Depending on your data warehouse requirements, you have to include sufficient historical data.
External DataExternal DataExternal data is used especially by decision makers
statistics relating to their industry produced by external agencies and national statistical offices.
market share data of competitors.
standard values of financial indicators for their business to check on their performance.
Production data and archived data
give you a picture based on what you are doing or have done in the past.
Is not enough for understanding industry trends and compare performance
Extraction
Transformation
Loading
Data Staging Compon
ent
Data ExtractionData Extraction
• Source data may be from different source machines in diverse data formats.
Deals with numerous data sources
• Outside tools suitable for certain data sources• Develop in-house programs to do the data extraction.
Tools are available on the market for data extraction.
After extraction where to keep the data for further preparation?
Perform the extraction function in the legacy system
• extract the source data into a • group of flat files• a data -staging relational data base• a combination of both.
Extract the source into a separate physical environment from which moving the data into the data warehouse would be easier.
Data TransformationData Transformation
Data for a data warehouse comes from many disparate sources
• Clean the data from each source: misspellings, resolution, missing data, duplicates
• Standardize data elements: data types, lengths, synonyms/homonyms
• Combine related information
• Purge useless data• Choose appropriate keys• Summarize if necessary
Data feed is not just an initial load.
• Same (maybe slightly adapted) transformation process will be applied periodically.
Data LoadingData Loading
Data Storage ComponentData Storage Component
A separate repository
Large volumes of historical data for analysis
not for quick retrieval of individual pieces of information
multidimensional databases store data aggregated at different levels
Information Delivery Information Delivery ComponentComponent
novice user
need prefabricated reports and preset queries
casual user
need prepackaged information once in a while
business analyst
need ability to do complex analysis using the information in the data warehouse
power user
need to be able to navigate throughout the data warehouse, pick up interesting data, format his or her own queries, drill through the data layers, and create custom reports and ad hoc queries.
Ad hoc reports
complex queries, multidimensional analysis, and statistical analysis
Information Delivery Information Delivery ComponentComponent
•knowledge discovery systems where the mining algorithms help to discover trends and patterns from the data
•online queries and reports•scheduled reports through e-mail or intranet • information delivery over the Internet
Information Delivery Information Delivery ComponentComponentInformation fed into executive information systems (EIS) is meant for senior executives and high-level managers.
Some data warehouses also provide data to data mining applications.
In your data warehouse , you may include several information delivery mechanisms.
Metadata ComponentMetadata ComponentSimilar to the data dictionary or the data catalog in a DBMS
Data about the data in the data warehouse.
key architectural component of the data warehouse.
•Operational metadata•Extraction and transformation metadata•End-user metadata
Types of Metadata:
•connects all parts of the data warehouse .•provides information about the contents and structures to the developers.
•makes the contents recognizable to the end users.
Importance of Metadata
Management and Control Management and Control ComponentComponent
sits on top of all the other components.
coordinates the services and
activities within the data warehouse.
controls the data transformation and the data transfer
into the data warehouse storage.
moderates the information delivery
to the users.
works with the database
management systems and enables data to be properly
stored in the repositories.
monitors the movement of data
into the staging area and from there into the data warehouse
storage itself.
interacts with the metadata
component to perform the
management and control functions.