+ All Categories
Home > Business > Capacity management for ETL System

Capacity management for ETL System

Date post: 14-Jul-2015
Category:
Upload: ashok-bhatla
View: 106 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Capacity Model of an ETL system Ashok Bhatla Email [email protected]
Transcript
Page 1: Capacity management for ETL System

Capacity Model of an ETL system

Ashok BhatlaEmail – [email protected]

Page 2: Capacity management for ETL System

What is Business Intelligence?Business Intelligence (BI) is a combination of tools, processes and

software which help a company to transform data into actionable

knowledge, thereby allowing them to take faster and informed decisions in

order to achieve their strategic goals.

It’s all about providing right information to the management at the right

time with the lowest possible cost.

As we are drowning in data, but

starving for knowledge,

Business Intelligence has

become the No. 1 priority for IT

Managers today.

Page 3: Capacity management for ETL System

ETL stands for Extract, Transform and Load. A transactional system is meantto be a high performance system so that users can get their work faster.Running some reports from a Transactional system makes it slower. Therefore,the concept of ETL gained popularity.

In computing, Extract, Transform, and Load

(ETL) refers to a process in database usage

which involves the following steps

- Extracts data from outside sources.

- Transforms it to fit operational needs,

which can include joining/reformatting

some tables.

- Loads it into the end target (database,

more specifically, operational data

store, data mart, or data warehouse)

What is ETL?

Page 4: Capacity management for ETL System

Example of ETL

Payroll

Data

Sales

Data

Purchasing

Data

OLTP Systems

ETL – Joins,

Transforms,

Deletes etc.

Staged

Data

Load Data

Cost

Accounting

System

EDW /

Reporting Data

Page 5: Capacity management for ETL System

What is Capacity Planning?

Capacity Planning is the process of identifying the current

computing needs of a business application and to forecast the

future computing needs based on the business plans.

In other words, it means what computing resources are needed to

meet an application’s service level objectives over a period of time.

In today’s economic climate, business requirements can change

rapidly depending upon an organization’s strategy and goals.

Therefore properly managed capacity plans should be able to take

unforeseen requirements into account.

Capacity Planning can be either done in a very casual manner or

very organized and disciplined methodologies can be used.

More data driven the capacity planning is, more accurate the

results.

Page 6: Capacity management for ETL System

Capacity Planning of an IT System

Software Licenses, No. of Users

Servers, Storage, Networking, CPU

Data Center Space, Power, Cooling

Capacity planning needs to

ensure that all Hardware (Disks,

Memory, CPU, and Network),

Software resources (User

Licenses) and facilities are

optimally used.

Page 7: Capacity management for ETL System

Capacity Planning

Proactive Capacity Planning

Avoid downtimes by reducing no of

Incidents

Achieve optimal utilization of computing Resources

Reduce TCO for the ETL System

Achieve Performance

Objectives established by

business

If no corrective action is

taken based on measured

data, then Capacity

Planning is of no use

We cannot manage

something which we

cannot measure.

Page 8: Capacity management for ETL System

Capacity Planning Steps

Identify Service Level Objectives – know the requirements in business terms

Analyze Current Capacity – Gather data about resource consumption, ideal times and peak usage

Know the future business needs and plan for future capacity needs – How the IT systems will be able to handle increased load

Page 9: Capacity management for ETL System

Strike a Balance

Supply

Demand

Cost

Resources

Performance

Utilization

As per Parkinson’s Law, if you give

more resources to customers, they will

find ways to use more resources. IT

managers cannot keep on giving

unlimited resources to users.

As per Moore’s Law, IT is getting cheaper

and faster every 18 months. But

organizations cannot wait for next

generation of technology to be available –

as they need to take care of business.

Page 10: Capacity management for ETL System

Capacity Challenges for ETL Systems

ETL jobs are of different types (Full Refresh and some Delta Refresh), process varying amounts of data and are scheduled at different frequencies. Therefore, there are always spikes and valleys of workload.

An enterprise ETL system processes thousands of batch jobs on a daily basis. These Systems connect to large no. of data sources which reside on different platforms and may be on different networks across the WAN

SQL queries are simple and do not require parallelism. On the other hand in an ETL system, very large datasets and processed and Workloads are random in nature and not easy to predict. This makes it difficult to predict the resource requirement.

Different types of users have different peak usage requirements. They have different needs for Transaction times, Elapsed Times and Response Times

Page 11: Capacity management for ETL System

Disks Capacity Issues – Engineers spending lots of time cleaning old stale data

Over Capacity – Paid for extra compute Capacity, but not utilizing it

Network Slowness Problems – Batch Jobs running slow sometimes.

No. of User Licenses reaching limits.

Page 12: Capacity management for ETL System

User Needs

Transaction TimeResponse TimeElapsed TimeThroughput Time

Data Usage Patterns

(Financial, Marketing or Factory Data)

Data Complexity

(Type of SQL Queries or ETL Transformations)

Volume and Frequency of Data Loads

(No. of Batch Jobs and GB of data processed)

User Profile

(Simple User or Advanced Data Miner)

Business Terms

Storage

( SAN / NAS / Local Disks,)Processing Power(CPU, No. of Cores )

Network Bandwidth

(Transfer Rate, Bytes Tx/Rx)Memory (Physical, Cache, Swap)

Technical Terms

Analyse the Complete Picture

Page 13: Capacity management for ETL System

Capacity Planning Tools

Simulation Testing Trending Analytical Modeling

Accurate, but needs lots of time for setup

Costly, as anotherenvironment similar to Production is needed.

Can be done using Excel. Simple, but does not take non linear behavior into account

More advanced,Faster and Accurate

Vectors of Measurement

Availability

Performance

Throughput

Utilization

Quality

Efficiency

Page 14: Capacity management for ETL System

Data Collection

Period ( WW or Month)No. of Subject Areas No. of Projects

No. of ETL Batch Jobs

Storage Consumption CPU Network Disk I/O Tx/Rx Bytes

How do we collect Performance / Capacity Data?

OS monitoring tools – even freeware like Nagios, kSar, SQLMon. PerfMon

Data collected in SQL tables

Data collected by Software used by the Storage Frames – gives Utilization, Capacity

and Performance Data

Page 15: Capacity management for ETL System

Capacity Model for ETL System ??

Examples of some metrics which can be developed

o Average Run time for a Batch job

o Average CPU for a Batch job

o CPU Utilization /Subject Areas /Week

o CPU Utilization / Project / Week

o No. of Batch Jobs / GB of Storage

o No. of Batch Jobs / X Amount of CPU

Page 16: Capacity management for ETL System

Dashboard / Indicators

Phase I

Develop a Trending Model in the beginning

Dashboards can be developed using Share Point BI if the Capacity Data is captured

in an Excel Pivot Table or SQL Databases

Phase II

Can we develop a Predictive Model???

Page 17: Capacity management for ETL System

Recommended