Date post: | 08-Jun-2015 |
Category: |
Technology |
Upload: | insight-technology-inc |
View: | 615 times |
Download: | 4 times |
Vectorwise Implementation best practices
Mark Van de Wiel
Thursday, November 01, 2012
Director Product Management, Vectorwise
1 of 9 1 of 9 1 of 9 1 of 9 1 of 9 Confidential © 2012 Actian Corporation
Agenda
2
Hardware
Operating system
Database configuration
Database design
Data loading
High availability
Monitoring
Confidential © 2012 Actian Corporation
100x (+) Performance Difference – 2003 Custom C versus Relational Database
Confidential © 2012 Actian Corporation 3
26.2 28.1
0.2 0.6 0
5
10
15
20
25
30
MySQL DBMS 'X' C program Vectorwise
TPC-H 1 GB query 1 (runtime in s)
MySQL DBMS 'X' C program Vectorwise
Some Numbers
Confidential © 2012 Actian Corporation 4
Traditional RDBMS: <200 MB/s per core Even these use MPP to I/O challenges
Vectorwise (lab environment): >1.5 GB/s per core Maximum throughput requirement is extremely high
Realistically (cost-effectively) only RAM can serve data quick enough
What Hardware to Use
Confidential © 2012 Actian Corporation 5
CPU
Memory
Storage I/O and capacity
Requirements Budget
Hardware Considerations – MEMORY
Confidential © 2012 Actian Corporation 6
Ideally frequently-accessed data should fit in memory May be all data
May be a small portion of the data
Note: data is compressed in memory buffer
• 3x – 5x compression ratios are common
Query execution should all take place in memory Operations against larger data sets require more memory
Consider query concurrency
“Spill to disk” is supported but should be a last resort
Hardware Recommendation
Confidential © 2012 Actian Corporation 7
CPUs Use CPUs with higher clock rate for better raw throughput
Use more cores for higher throughput
Higher power CPUs are faster
Memory At least 8 GB per core (more is always better)
Storage Use as many drives as possible
Ensure sufficient capacity
Use the fastest drives available
• SAS over SATA, ideally 15k RPM
• SSDs are often not cost-effective relative to more memory
Examples
Confidential © 2012 Actian Corporation 8
Small configuration (1 TB) Dell R620 Lenovo RD430
Medium configuration (single digit TBs) Dell R720 HP DL380 IBM x3650 Lenovo RD630
High-end configuration Dell R910 HP DL580 or DL980 IBM x3750
Operating System Considerations
Confidential © 2012 Actian Corporation 9
Redhat SuSE
Ubuntu
Windows 7 (or higher) Windows 2008 (or higher)
64-bit
xfs, ext3, ext4
Database Configuration
Confidential © 2012 Actian Corporation 10
Installation defaults are generally good May want to adjust column buffer size (default 25% of RAM)
May want to adjust processing memory (default 50% of RAM)
Database Design
Confidential © 2012 Actian Corporation 11
Schema – no particular preference Single demormalized table, star schema, snowflake schema, 3rd normal form
Constraints Only on empty tables today… (to be addressed in Vectorwise 3.0)
Consider data loading order and impact
Indexes Note: clustered index-only today (“index-organized table”)
One per table
Consider incremental load
Data Loading
Confidential © 2012 Actian Corporation 12
Initial load File-based bulk load through vwload or copy
Conversion into UTF8
Use tools Pentaho
Informatica
Talend
HVR
Attunity
Data Loading
Confidential © 2012 Actian Corporation 13
Incremental load INSERT, UPDATE and/or DELETE
Append if possible
Batch if possible
Use COMBINE
Positional Delta Trees Memory considerations
Propagation to disk
Use tools
Moving Window of Data
Confidential © 2012 Actian Corporation 14
Considerations COMBINE on a large table can be expensive
Mostly relevant for updates and deletes
Alternative: manual partitioning One table per period
Single view across all tables
High Availability
Confidential © 2012 Actian Corporation 15
Hardware and OS best practices UPS, RAID
Vectorwise backup Only read-only, full backup
Consider periodic full backup and file incremental loads
Disaster recovery Dual load
Active/active possibility
Monitoring
Confidential © 2012 Actian Corporation 16
OS monitoring CPU, memory utilization, I/O statistics
vwinfo data
Actian Director
DBA tools
Agenda
17
Hardware
Operating system
Database configuration
Database design
Data loading
High availability
Monitoring
Confidential © 2012 Actian Corporation
More information in the Vectorwise Developer Guide: http://www.actian.com/images/white_papers/vw_developers_v2.5.pdf
Confidential © 2012 Actian Corporation