+ All Categories
Home > Technology > Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7...

Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7...

Date post: 14-May-2015
Category:
Upload: keshav-murthy
View: 3,334 times
Download: 4 times
Share this document with a friend
Description:
Talk at Information on Demand Conference 2011. As part of the Informix Ultimate Warehouse Edition, Informix Warehouse Accelerator (IWA) transparently provides up to severalorders of a magnitude speed up in query performance forInformix Dynamic Server (IDS), as well as enormous administrativecost savings. Combined with the Intel Xeon E7 processor series,Informix and the Accelerator brings the performance andscalability of IDS solutions to new levels. This presentation willgive best practices and benefits of IWA and the Intel Xeon E7processors, and highlight the implications and performancebenefits of running IDS and IWA on these processors, comparedto previous releases of IDS and prior Intel server platforms.
Popular Tags:
43
Performance and Scalability of Informix®Ultimate Warehouse Edition on Intel Xeon®7500 and E7 processors Session Number 2864 Keshava Murthy, IBM® Jantz Tran, Intel®
Transcript
Page 1: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Performance and Scalability of Informix® Ultimate Warehouse Edition on Intel Xeon® 7500 and E7 processorsSession Number 2864

Keshava Murthy, IBM®Jantz Tran, Intel®

Page 2: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

1

Agenda

• Intel Inside

• IWA Overview

• Key performance features in Intel

•How IWA is exploiting the Intel features.

• Performance results

Page 3: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Tick-Tock Development ModelSustained Xeon® Microprocessor Leadership

2

Tick Tock Tick Tock Tick Tock Tick Tock

Intel® Core™Microarchitecture

Nehalem/Westmere

Microarchitecture

Sandy Bridge/Ivy Bridge

Microarchitecture

65nm65nm 45nm45nm 32nm32nm

Xeon® 5300

Xeon® 5100 Xeon

® 7400

Xeon® 7500 Sand

y

Bridge-EP

/ENXeon® E7 Ivy B

ridge

EP/EN

22nm22nm

Dedicated high-

speed bus per CPU

HW-assisted

virtualization (VT-x)

Integrated memory controller

with DDR3 support

Turbo Boost, Intel HT, AES-

NI1

End-to-end HW-assisted

virtualization (VT-x, -d, -c)

Integrated PCI Express

Turbo Boost 2.0

Intel Advanced Vector

Extensions (AVX)

First high-volume server Quad-

Core CPUsUp to 10 cores

and 30MB Cache

Up to 8 cores

and 20MB Cache

Page 4: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Intel® Xeon® processor 3000 sequence platforms (E3 in 2012)

Economical (1-way) dependable general purpose 64-bit servers well-suited for small businesses and education with features that optimize performance, uptime, and security

Intel® Xeon® processor 5000 sequence platforms (E5 in 2012)

Versatile (up to 2-way) servers for all your infrastructure, high-density, workstation and HPC applications with features that enable optimal performance and power efficiency for the data center.

Intel® Xeon® processor E7 platforms

Scalable (up to 256-way), reliable, powerful 64-bit multi-core servers offering industry-leading performance, expanded memory & I/O capacity, and advanced reliability ideal for the most demanding enterprise and mission critical workloads, large scale virtualization and large-node HPC applications.

Intel® Xeon® ProcessorFamily for Business

Mainstream

EnterpriseBest combination of

performance, power efficiency,

and cost

High Performance Computing &

WorkstationsBandwidth-optimized for highperformance analytics & visualization

Small Business

Economical and more dependable vs. desktop

Increasing capability

Cloud Computing

Efficient, secure, and open platforms for Internet datacenters and IAAS

Entry Servers and WorkstationsMore features and performance than traditional desktop systems

Enterprise ServerVersatility for infrastructure apps (up to 4S)

Scalable

Enterprise

Top-of-the-line performance,

scalability, and reliability

Cloud ComputingHighest virtualization density and advanced reliability for private cloud

Mission Critical

Performance and reliability for the most business critical workloads with outstanding economics

High Performance Computing

Greater scaling and memory capacity

Page 5: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Intel® Xeon® Processor E7-8800/4800/2800 Product Families Building on Xeon® 7500 Leadership Capabilities

• More performance within same max CPU TDP as Xeon

7500

• Lower partial active & idle power via Intel Intelligent

Power Technology2

• Support for Low Voltage-DIMMs3

• Reduced power memory buffers4

More Efficient

• Supports 32GB DDR3 DIMMs (2TB per 4-socket system)1

More Expandable

More Security & RAS

• 10 cores / 20 threads

• 30MB of last level cache

More Performance

E7-4800 E7-4800

E7-4800 E7-4800

SECURITY

• Intel® Advanced Encryption Standard-New Instructions

• Intel® Trusted Execution Technology (TXT)

RELIABILITY, AVAILABILITY, SERVICEABILITY

• Enhanced DRAM Double Device Data Correction

• Fine Grained Memory Mirroring

1. Up to 64 slots per standard 4 socket system x 32GB/DIMM = 2TB2. Uses similar core and package C6 power states enabled on Intel Xeon 5500/5600 series processors. Requires OS support.3. Savings dependent on workload and configuration. 4. Memory buffer power savings of up to 1.3W active and 3W idle per buffer per Intel estimates. Slightly more savings when used with LV DIMMs

Delivers more Performance, Expandability and RASwhile improving Energy Efficiency

Page 6: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

4-socket systems can……process the biggest workloads…maximize consolidation

…increase system uptime…handle highly variable workloads

Large Workloads Large Workloads

& Max. Consolidation& Max. ConsolidationHighly Variable WorkloadsHighly Variable Workloads

Mission Critical Class System Mission Critical Class System

AvailabilityAvailability

Over 2X 2X the compute performanceacross a range of benchmarks1

Up to 7X7X memory capacity for greater performance, headroom and memory

DIMM savings2

Up to 2X 2X higher consolidation3

More performance headroom to handle peak, unexpected, or underestimated workloads

Compute, memory and I/O scalability extends useful server life in high-growth workloads

Denser compute resources per server maximizes performance in constrained sites

Protects your data by preventing errors

Increased availability via healing, redundancy and failover

technologies

Minimized downtime via failure prediction and proactive

replacement of failing components

IntelIntel ®® XeonXeon®® Processor E7Processor E7--4800 Product Family vs. Xeon4800 Product Family vs. Xeon®® Processor 5600 Series Processor 5600 Series

Advantages of the Xeon® E7 Platform

1. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/limits.htm

2. 64 DIMM slots vs. 18 slots for the Xeon 5600 processor series platform 3. 2X higher consolidation refresh ratio based on ROI tool comparing Xeon 7500 and Xeon 5600 vs.. older generations.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to

vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Page 7: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

6

• Machine Check Architecture

(MCA) recovery (MCA-R)

• Machine Check Architecture

(MCA) recovery (MCA-R)

Memory

• Inter-socket Memory Mirroring

• Intel® Scalable Memory

Interconnect (Intel® SMI) Lane

Failover

• Intel® SMI Clock Fail Over

• Intel® SMI Packet Retry

• Memory Address Parity

• Failed DIMM Isolation

• Memory Board Hot Add/Remove

• Dynamic Memory Migration*

• OS Memory On-lining *

• Recovery from Single DRAM

Device Failure (SDDC) plus

random bit error

• Memory Thermal Throttling

• Demand and Patrol scrubbing

• Fail Over from Single DRAM

Device Failure (SDDC)

• Enhanced DRAM Double Device

Data Correction

• Fine Grained Memory Mirroring

• Memory DIMM and Rank Sparing

• Intra-socket Memory Mirroring

• Mirrored Memory Board Hot

Add/Remove

• Inter-socket Memory Mirroring

• Intel® Scalable Memory

Interconnect (Intel® SMI) Lane

Failover

• Intel® SMI Clock Fail Over

• Intel® SMI Packet Retry

• Memory Address Parity

• Failed DIMM Isolation

• Memory Board Hot Add/Remove

• Dynamic Memory Migration*

• OS Memory On-lining *

• Recovery from Single DRAM

Device Failure (SDDC) plus

random bit error

• Memory Thermal Throttling

• Demand and Patrol scrubbing

• Fail Over from Single DRAM

Device Failure (SDDC)

• Enhanced DRAM Double Device

Data Correction

• Fine Grained Memory Mirroring

• Memory DIMM and Rank Sparing

• Intra-socket Memory Mirroring

• Mirrored Memory Board Hot

Add/Remove

Advanced Reliability Starts With SiliconIntel® Xeon® processor E7 family RAS Capabilities

I/O Hub

• Physical IOH Hot Add

• OS IOH On-lining*

• PCI-E Hot Plug

• Physical IOH Hot Add

• OS IOH On-lining*

• PCI-E Hot Plug

CPU/Socket

• Machine Check Architecture (MCA)

recovery (MCA-R)

• Corrected Machine Check Interrupt

(CMCI)

• Corrupt Data Containment Mode

• Viral Mode

• OS Assisted Processor Socket

Migration*

• OS CPU on-lining *

• CPU Board Hot Add at QPI

• Electronically Isolated (Static)

Partitioning

• Single Core Disable for Fault

Resilient Boot

• Machine Check Architecture (MCA)

recovery (MCA-R)

• Corrected Machine Check Interrupt

(CMCI)

• Corrupt Data Containment Mode

• Viral Mode

• OS Assisted Processor Socket

Migration*

• OS CPU on-lining *

• CPU Board Hot Add at QPI

• Electronically Isolated (Static)

Partitioning

• Single Core Disable for Fault

Resilient Boot

Intel® QuickPath Interconnect

• Intel QPI Packet Retry

• Intel QPI Protocol Protection via

CRC (8bit or 16bit rolling)

• QPI Clock Fail Over

• QPI Self-Healing

• Intel QPI Packet Retry

• Intel QPI Protocol Protection via

CRC (8bit or 16bit rolling)

• QPI Clock Fail Over

• QPI Self-Healing

Advanced reliability features work to maintain data integrityAdvanced reliability features work to maintain data integrity

Page 8: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

More Efficient

More Options

Higher performance

Lower platform power1

Optimized Turbo Boost

Intel Node Managerenhancements

More Intelligent

More SecureIntel AES-NI improvements

More robust Intel TXT solutions

Optimized platforms for:

� Performance

� Smaller Form Factors

� Best value

IntelIntel®®

XeonXeon®®

processor E5processor E5--2600 product family (Sandy Bridge2600 product family (Sandy Bridge--EP)EP)New micro-architecture on the 32nm process technology

Platform Features

Up to 8 Cores

Sandy Bridge-EP

QPI

Up to 8 cores, 20 MB cache

New Intel® Advanced Vector Extensions

Optimized Turbo Boost Technology

Up to2 QPIlinks

betweenCPUs

Integrated PCI Express* 3.0Up to 40 lanes per socket

Up to4 channelsDDR3 1600

memory

1 Lower platform power claim based on a Xeon® 5600 CPU and Sandy Bridge-EP CPU with the same TDP specification and comparable platform configurations. Platform power reduction is primarily attributed to TDP reduction from a two-chip solution based on the Intel 5520 chip set and ICH-10R, down to a one-chip south

bridge solution(Patsburg chip) on the Sandy Bridge platform.

Page 9: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

8

INTEL: Breakthrough technologies for performance

1

2

34

5

6

7 1

2

34

5

6

7

1. Large memory support64-bit computing; System X with MAX5 supports up to 6TB on a single SMP box; Up to 640GB on each node of blade center.

7. Multi-core, multi-node environmentNehalem has 8 cores and Westmere 10 cores. This trend is expected to continue.

4. Virtualization PerformanceLower overhead: Core micro-architecture enhancements, EPT, VPID, and End-to-End HW assist

5. Hyperthreading2x logical processors; increases processor throughput and overall performance of threaded software.

3. Frequency PartitioningEnabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination.

2. Large on-chip CacheL1 cache 64KB per core, L2 cache is 256KB per core and L3 cache is about 24-30 MB. Additional Translation lookaside buffer (TLB).

6. Single Instruction Multiple DataSpecialized instructions for manipulating 128-bit data simultaneously.

Page 10: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Intel® Xeon® E7 Processor Architecture

9

Core 0Core 0 L1L1 L2L2

Core 1Core 1 L1L1 L2L2

Core 2Core 2 L1L1 L2L2

Core 3Core 3 L1L1 L2L2

Core 4Core 4 L1L1 L2L2

Core 5Core 5L2L2 L1L1

Core 6Core 6L2L2 L1L1

Core 7Core 7L2L2 L1L1

Core 8Core 8L2L2 L1L1

Core 9Core 9L2L2 L1L1

Shared L3Shared L3

IMCIMC QPI (4 Links)QPI (4 Links)

• 2 integrated memory controllers

• Scalable Memory Interconnect (SMI) with support for up to 8 DDR

channels

• 4 Quick Path Interconnect (QPI) system interconnect links

IMCIMC

Cache Architecture

•64K L1 Cache

•256K L2 Cache

•30MB 10 slice shared

Last Level cache (L3)

(compared to 24MB 8

slice L3 on Xeon® 7500)

Page 11: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Intel QuickPath Architecture

•Connectivity

– Fully-connected by 4 Intel® QuickPath

– interconnects per socket

– 6.4, 5.86, or 4.8 GT/s on all links

– With 2 IOHs: 82 PCIe lanes (72 Gen2 Boxboro lanes + 4 Gen1 lanes on unused ESI port + 6 Gen1 ICH10 lanes)

– PCE-E Gen 2.0

•Memory

– Registered DDR3 800/1066 MHz via on-board memory buffer

– 64 DIMM support (4:1 DIMM to buffer ratio)

MB

MB

MB

MB

MB

MB

MB

MB

MB

MB

MB

MB

MB

MB

MB

MB

Intel® QuickPathinterconnects

BoxboroBoxboro BoxboroBoxboro

7500/E7 CPU7500/E7 CPU

7500/E7 CPU7500/E7 CPU

7500/E7 CPU7500/E7 CPU

7500/E7 CPU7500/E7 CPU

Page 12: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Intel® Xeon® 7500/E7 8 Socket Configuration

11

4+4 (8S)

Up to 10 cores and 2.4 Ghz

per CPU

Support 8 socket mode by

combining 2 systems via

external QPI links

Memory Configuration

� 4TB in 8 socket server

� 6TB in 8 socket + MAX5

� Continued 1066MHz

support

IBM® System

x3850 X5

Page 13: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Intel®: SIMD – Single Instruction Multiple Data technology

• The Intel Xeon® E7 processor supports up to SSE 4.2

• SIMD capabilities will be expanded to 256-bit registers with the new AVX

instruction set in the upcoming Intel® Xeon® E5 series processors

• Informix leverages SSE in the Warehouse Accelerator

Page 14: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Intel® Xeon® Processors: Virtualization Performance

Greater Greater Virtualization Virtualization

EfficiencyEfficiency: :

Intel QPIIntel QPI

DDR3 Memory DDR3 Memory

bandwidth and bandwidth and

capacitycapacity

IntelIntel®® VTVT

VTVT--xx

VTVT--dd

VTVT--cc

Virtualization Performance2

VMmark* Performance

1 Best published VMmark results as of 20 October 2010.

See legal information slide, speaker notes and backup foils (if needed) for notes and disclaimers.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured

using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and

performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Page 15: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

14

Third Generation of Database Technology

According to IDC’s Article (Carl Olofson) – Feb. 2010

1st Generation:

- Vendor proprietary databases of IMS, IDMS, Datacom

2nd Generation:

- RDBMS for Open Systems, dependent on disk layout, limitations in scalability and

disk I/O

- Database tuning by adding updating stats, creating/dropping indexes, data

partitioning, summary tables & cubes, force query plans, resource governing

3rd Generation: IDC Predicts that within 5 years:

• Most data warehouses will be stored in a columnar fashion

• Most OLTP database will either be augmented by an in-memory database (IMDB) or

reside entirely in memory

• Most large-scale database servers will achieve horizontal scalability through

clustering

Page 16: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

15

Informix Database Server

Informix warehouse Accelerator

BI Applications

Step 1. Install, configure,start Informix

Step 2. Install, configure,start Accelerator

Step 3. Connect Studio to Informix & add accelerator

Step 4. Design, validate, Deploy Data mart

Step 5. Load data to accelerator

Ready for Queries

IBM Smart Analytics Studio

Step 1

Step 2

Step 3

Step 4

Step 5

Ready

Informix Warehouse Accelerator

Page 17: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

16

Informix Warehouse Accelerator3rd Generation Database Technology is Here

How is it different?

• Performance: Unprecedented response

times to enable 'train of thought' analysis

frequently blocked by poor query

performance.

• Integration: Connects to IDS through deep

integration providing transparency to all

applications.

• Self-managed workloads: queries are

executed in the most efficient way

• Transparency: applications connected to

IDS, are entirely unaware of IWA

• Simplified administration: appliance-like

hands-free operations, eliminating many

database tuning tasks

What is it?

The Informix Warehouse Accelerator (IWA) is a

workload optimized, appliance-like, add-on, that enables

the integration of business insights into operational

processes to drive winning strategies. It accelerates

select queries, with unprecedented response times.

Breakthrough Technology Enabling New Opportunities

Page 18: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

17

Page 19: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

18

IWA Software Components

• Linux on Intel x86_64 (RHEL 5 or SUSE SLES 11)

• IDS 11.70 + IWA code modules including IDS Stored Procedures

– Linux on Intel (64 bit)

– AIX on Power (64 bit)

– HPUX on Itanium (64 bit)

– Solaris on Sparc (64bit)

• ISAO Studio Plug-in – GUI for Mart definition

• OnIWA – On Utilities for Monitoring IWA

Page 20: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

19

INTEL/IWA: Breakthrough technologies for performance

1

2

34

5

6

7 1

2

34

5

6

7

1. Large memory support64-bit computing; System X with MAX5 supports up to 6TB on a single SMP box; Up to 640GB on each node of blade center. IWA: Compress large dataset and keep it in memory; totally avoid IO.

7. Multi-core, multi-node environmentNehalem has 8 cores and Westmere 10 cores. This trend is expected to continue. IWA: Parallelize the scan, join, group operations. Keep copies of dimensions to avoid cross-node synchronization.

4. Virtualization PerformanceLower overhead: Core micro-architecture enhancements, EPT, VPID, and End-to-End HW assist IWA: Helps informix and IWA to seemlessly run and perform in virtualized environment.

5. Hyperthreading2x logical processors; increases processor throughput and overall performance of threaded software. IWA: Does not exploit this since the software is written to avoid pipeline flushing.

3. Frequency PartitioningIWA: Enabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination.

2. Large on-chip CacheL1 cache 64KB per core, L2 cache is 256KB per core and L3 cache is about 4-12 MB. Additional Translation lookaside buffer (TLB).IWA: New algorithms to avoid pipeline flushing and cache hash tables in L2/L3 cache

6. Single Instruction Multiple DataSpecialized instructions for manipulating 128-bit data simultaneously. IWA: Compresses the data into deep columnar fashion optimized to exploit SIMD. Used in parallel predicate evaluation in scans.

Page 21: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Compressed dataIn memory

Worker

Memory image on disk

20

ApplicationsBI Tools

Step 1. Submit SQLDB protocol: SQLI or DRDANetwork : TCP/IP,SHM

Informix

2. Query matching and redirection technology

Step 3offload SQL.DRDA over TCP/IP

Step 4Results:DRDA over TCP/IP

Local Execution

Coordinator

Compressed dataIn memory

Worker

Memory image on disk

Compressed dataIn memory

Worker

Memory image on disk

Compressed dataIn memory

Worker

Memory image on disk

Step 5. Return results/describe/errorDatabase protocol: SQLI or DRDANetwork : TCP/IP, SHM

IWA: Multi-core and Multi-node environment

Page 22: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

21

Step5: Send the results back to Infomrix server

Step1SQL from Informix

Coordinator

Compressed dataIn memory

Worker

Step3: Scan, Filter, join, group

Compressed dataIn memory

Worker

Step3: Scan, Filter, join, group

Compressed dataIn memory

Worker

Step3: Scan, Filter, join, group

Compressed dataIn memory

Worker

Step3: Scan, Filter, join, group

Step2Send the queries to all the workers

Step4: merge intermediate results, ORDER BY, FIRSTN

IWA: Multi-core and Multi-node environment

Page 23: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Compressed and Partitioned Data

QueryExecutor

core + $ (HT)core + $ (HT)

• Cell is also the unit of processing, each cell…

– Assigned to one core

– Has its own hash table in cache (so no shared object that needs latching!)

• Main operator: SCAN over compressed, main-memory table

– Do selections, GROUP BY, and aggregation as part of this SCAN

– Only need de-compress for aggregation

• Response time ∝∝∝∝ (database size) / (# cores x # nodes)

– Embarrassing Parallelism – little data exchange across nodes

DictionariesDictionaries

core + $ (HT)core + $ (HT)

core + $ (HT)core + $ (HT)

Cell

1

Cell

2

Cell

3

IWA: Multi-core and Multi-node environment

Page 24: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

23

Expoloiting Larger Memory: Row Oriented Data StoreEach row stored sequentially

• Optimized for record I/O

• Fetch and decompress entire row, every time

• Result –

• Very efficient for transactional workloads

• Not always efficient for analytical workloads

If only few columns are required the complete row is still fetched and uncompressed

Page 25: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

24

Expoloiting Larger Memory: Data is Processed in Compressed Format

• Within a Register – Store, several columns are grouped together.

• The sum of the width of the compressed columns doesn‘t exceed a register compatible width. This utilizes the full capabilities of a 64 bit system. It doesn‘t matter how many columns are placed within the register – wide data element.

• It is beneficial to place commonly used columns within the same register – wide data element. But this requires dynamic knowledge about the executed workload (runtime statistics).

• Having multiple columns within the same register – wide data element prevents ANDing of different results.

The Register – Store is an optimization of the Column – Store approach where we try to make the best use of existing hardware. Reshuffeling small data elements at runtime into a register is time consuming and can be avoided. The Register – Store also delivers good vectorization capabilities.

Predicate evaluation is done against compressed data!

Page 26: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

25

Top 64 traded goods– 6 bit code

Rest

Prod Origin

Trade Info (volume, product, origin country)

CommonValues

Rare values

Nu

mb

er

of

Occ

urr

en

ces Histogram

on Origin

Histogram on Product

Origin

Pro

du

ct

ChinaUSA

GER,FRA,… Rest

Table partitioned into Cells

Column Partitions

Vol

Exploiting Large memory: Compression: Frequency Partitioning

• Field lengths vary between cells• Higher Frequencies Shorter Codes (Approximate Huffman)

• Field lengths fixed within cells

Cell 4Cell 1

Cell 2

Cell 3

Cell 5 Cell 6

Page 27: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

26

IWA: SIMD: Register Stores Facilitate SIMD Parallelism

•Access only the banks referenced in the query (like a column store):

–SELECT SUM (T.G) –FROM T–WHERE T.A > 5–GROUP BY T.D

•Pack multiple rows from the same bank into the 128-bit register

• Enables yet another layer of parallelism: SIMD (Single-Instruction, Multiple-Data)!

A1 D1 G1

A2 D2 G2

A4 D4 G4

Bank ββββ1 (32 bits)

A3 D3 G3

B1 E1 F1

B2 E2 F2

B4 E4 F4

C1 H1

C3 H3

C4 H4

Bank ββββ2 (32 bits)Bank ββββ3 (16 bits)

Cell B

lock

B3 E3 F3

C2 H2

32 bits 32 bits32 bits32 bits

128 bitsResult1 Result2 Result3 Result4

Operand Operand Operand Operand

Vector Operation

Page 28: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

27

IWA:SIMD: Simultaneous Evaluation of Equality Predicates

State==‘CA’ && Quarter == ‘Q4’

State==01001 && Quarter==1110

Translate value queryto Code query

Row

Mask

Selectionresult

… … … …

11111 0 1111 0

01001 0 1110 0

==

&

• CPU operates on 128-bit units

• Lots of fields fit in 128 bits

• These fields are at fixed offsets

• Apply predicates to all columns simultaneously!

State Quarter

Page 29: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

•Encoding makes grouping simple!

–Coded values assigned densely (by construction) –Hence, in principle, grouping is simple: aggTable[group] += aggValue

•Challenges:

–Fitting hash table in L2 cache–Avoiding all branches in hash table lookup

•IWA adaptively uses one of 2 techniques, depending on # of distinct groups

1.Use dictionary code as a perfect hash (i.e. collision-free), OR

•aggTable[groupCode] += aggValue

•No branches, no hash function computation

•Works great if groupCode is dense – i.e., single column, or multiple column with little correlation

2.Use usual linear probing

•Involves branches, random access, …

Exploiting Large on-chip Cache

Page 30: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

29

Case Study #1: U.S. Government Agency

Page 31: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

30

• Microstrategy report was run, which generates

• 667 SQL statements of which 537 were Select statements

• Datamart for this report has 250 Tables and 30 GB Data size

• Original report on XPS and Sun Sparc M9000 took 90 mins

• With IDS 11.7 on Linux Intel box, it took 40 mins

• With IWA, it took 67 seconds.

Case Study #2: Datamart at a Government Agency

Page 32: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

31

Case Study #3: Skechers, USA. Shoe Retailer • Top 7 time-consuming queries in Retail BI and Warehouse:

(Against 1 Billion rows Fact Tables)

Query IDS 11.5 IDS 11.7 IWA

1 22 mins 4 secs

2 1 min 3 secs 2 secs

3 3 mins 40 secs 2 secs

4 30 mins & up 4 secs

5 2 mins 2 secs

6 30 mins 2 secs

7 45 mins & up 2 secs

Query acceleration 30x to 1400x – average acceleration 450x

Page 33: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Systems Tested

• 4S Intel® Xeon® 7560 (whitebox)

– 2.26 GHz 8C CPU

• 4S Intel® Xeon® E7 4870 (whitebox)

– 2.40 GHz 10C CPU

– 256GB 1066GHz DDR3 memory

• 8S Intel® Xeon® E7 7560 (IBM® System x3850 X5)

– 2.26 GHz 8C CPU

– 2TB 1066GHz DDR3 memory

32

Page 34: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

POPS schema

33

daily_sales

daily_forecast

Customer

Store

Period

Product

Promotion

350 million rows

1 billion rows

Page 35: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors
Page 36: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors
Page 37: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors
Page 38: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Systems Tested

• 8S Intel® Xeon® E7 7560 (IBM® System x3850 X5)

– 2.26 GHz 8C CPU

– 2TB 1066GHz DDR3 memory

37

Page 39: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Store Sales ER-Diagram500 GB SSED

4,594,771,672

20

73,049

1,920,800

1000

204,000

1,000,000

402

86,400

7200

2,000,000

Page 40: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

8936.42511ArithMean792628891379240.251368151137245413926951383661

3725.9501075307411539333651656935363204142444.5139664142908145702141504

7255.5320772034469.33320405152037336202555728040.2524197273912958230991

1360.2567892058833.667206086220650492050590151356.25151100149153152408152764

26663.9023521735082181312217943521597778151.58115805482078230

9088.18055529534252669185307202531190658266.2558522554365934159766

1879.3563541627433.33316060291656098162017386595.2591692838248571185154

4396.1847762269596.33322696772264463227464951626.552788556605041547643

3111.80971561521.66715680831570163154631950180.549932513745235447062

28077.092142525253.66725225852529234252394289949083872491329037

9162.6879235310510.6675325095531050752959305795860119564375831556961

4841.715267227059522819462278167225167246896.547031465734658247400

50383.73908220826892190310522354449219905134382943215458374244143823

3278.4130351545935.6671563874154740415265294715543668500324647048450

3583.7057371560157.33315449121557525157803543534.7544918444634641238346

4945.7900375985222594752560446265963515121016.5117572123593123030119871

8024.2622122310265.3332517291221154922019562879129362300832807527644

5454.1569551526850.33315287241526089152573827994.2529846246022930128228

9146.4635373163944.6673150876317365631673023459231651355793355137587

1462.6236351726097169040017227461765145118013.75117902117513117053119587

2004.222772189849018999161884782191077294724.592691956389719293377

5460.128295153851415389591538364153821928177.2527417269272717531190

3295.1791043324926.333334187333383523294554100902.759766692653104246109046

ImprovementIDS AVGIDS3IDS2IDS1IWA AVGIWA4IWA3IWA2

Page 41: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors
Page 42: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

41

Thank You!Your Feedback is Important to Us

• Access your personal session survey list and complete via SmartSite– Your smart phone or web browser at: iodsmartsite.com

– Any SmartSite kiosk onsite

– Each completed session survey increases your chance to win

an Apple iPod Touch with daily drawing sponsored by Alliance

Tech

Session Number 2864

Page 43: Performance and scalability of Informix ultimate warehouse edtion on Intel Xeon 7500 and E7 processors

Thank you!


Recommended