Date post: | 14-May-2015 |
Category: |
Technology |
Upload: | keshav-murthy |
View: | 3,334 times |
Download: | 4 times |
Performance and Scalability of Informix® Ultimate Warehouse Edition on Intel Xeon® 7500 and E7 processorsSession Number 2864
Keshava Murthy, IBM®Jantz Tran, Intel®
1
Agenda
• Intel Inside
• IWA Overview
• Key performance features in Intel
•How IWA is exploiting the Intel features.
• Performance results
Tick-Tock Development ModelSustained Xeon® Microprocessor Leadership
2
Tick Tock Tick Tock Tick Tock Tick Tock
Intel® Core™Microarchitecture
Nehalem/Westmere
Microarchitecture
Sandy Bridge/Ivy Bridge
Microarchitecture
65nm65nm 45nm45nm 32nm32nm
Xeon® 5300
Xeon® 5100 Xeon
® 7400
Xeon® 7500 Sand
y
Bridge-EP
/ENXeon® E7 Ivy B
ridge
EP/EN
22nm22nm
Dedicated high-
speed bus per CPU
HW-assisted
virtualization (VT-x)
Integrated memory controller
with DDR3 support
Turbo Boost, Intel HT, AES-
NI1
End-to-end HW-assisted
virtualization (VT-x, -d, -c)
Integrated PCI Express
Turbo Boost 2.0
Intel Advanced Vector
Extensions (AVX)
First high-volume server Quad-
Core CPUsUp to 10 cores
and 30MB Cache
Up to 8 cores
and 20MB Cache
Intel® Xeon® processor 3000 sequence platforms (E3 in 2012)
Economical (1-way) dependable general purpose 64-bit servers well-suited for small businesses and education with features that optimize performance, uptime, and security
Intel® Xeon® processor 5000 sequence platforms (E5 in 2012)
Versatile (up to 2-way) servers for all your infrastructure, high-density, workstation and HPC applications with features that enable optimal performance and power efficiency for the data center.
Intel® Xeon® processor E7 platforms
Scalable (up to 256-way), reliable, powerful 64-bit multi-core servers offering industry-leading performance, expanded memory & I/O capacity, and advanced reliability ideal for the most demanding enterprise and mission critical workloads, large scale virtualization and large-node HPC applications.
Intel® Xeon® ProcessorFamily for Business
Mainstream
EnterpriseBest combination of
performance, power efficiency,
and cost
High Performance Computing &
WorkstationsBandwidth-optimized for highperformance analytics & visualization
Small Business
Economical and more dependable vs. desktop
Increasing capability
Cloud Computing
Efficient, secure, and open platforms for Internet datacenters and IAAS
Entry Servers and WorkstationsMore features and performance than traditional desktop systems
Enterprise ServerVersatility for infrastructure apps (up to 4S)
Scalable
Enterprise
Top-of-the-line performance,
scalability, and reliability
Cloud ComputingHighest virtualization density and advanced reliability for private cloud
Mission Critical
Performance and reliability for the most business critical workloads with outstanding economics
High Performance Computing
Greater scaling and memory capacity
Intel® Xeon® Processor E7-8800/4800/2800 Product Families Building on Xeon® 7500 Leadership Capabilities
• More performance within same max CPU TDP as Xeon
7500
• Lower partial active & idle power via Intel Intelligent
Power Technology2
• Support for Low Voltage-DIMMs3
• Reduced power memory buffers4
More Efficient
• Supports 32GB DDR3 DIMMs (2TB per 4-socket system)1
More Expandable
More Security & RAS
• 10 cores / 20 threads
• 30MB of last level cache
More Performance
E7-4800 E7-4800
E7-4800 E7-4800
SECURITY
• Intel® Advanced Encryption Standard-New Instructions
• Intel® Trusted Execution Technology (TXT)
RELIABILITY, AVAILABILITY, SERVICEABILITY
• Enhanced DRAM Double Device Data Correction
• Fine Grained Memory Mirroring
1. Up to 64 slots per standard 4 socket system x 32GB/DIMM = 2TB2. Uses similar core and package C6 power states enabled on Intel Xeon 5500/5600 series processors. Requires OS support.3. Savings dependent on workload and configuration. 4. Memory buffer power savings of up to 1.3W active and 3W idle per buffer per Intel estimates. Slightly more savings when used with LV DIMMs
Delivers more Performance, Expandability and RASwhile improving Energy Efficiency
4-socket systems can……process the biggest workloads…maximize consolidation
…increase system uptime…handle highly variable workloads
Large Workloads Large Workloads
& Max. Consolidation& Max. ConsolidationHighly Variable WorkloadsHighly Variable Workloads
Mission Critical Class System Mission Critical Class System
AvailabilityAvailability
Over 2X 2X the compute performanceacross a range of benchmarks1
Up to 7X7X memory capacity for greater performance, headroom and memory
DIMM savings2
Up to 2X 2X higher consolidation3
More performance headroom to handle peak, unexpected, or underestimated workloads
Compute, memory and I/O scalability extends useful server life in high-growth workloads
Denser compute resources per server maximizes performance in constrained sites
Protects your data by preventing errors
Increased availability via healing, redundancy and failover
technologies
Minimized downtime via failure prediction and proactive
replacement of failing components
IntelIntel ®® XeonXeon®® Processor E7Processor E7--4800 Product Family vs. Xeon4800 Product Family vs. Xeon®® Processor 5600 Series Processor 5600 Series
Advantages of the Xeon® E7 Platform
1. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/limits.htm
2. 64 DIMM slots vs. 18 slots for the Xeon 5600 processor series platform 3. 2X higher consolidation refresh ratio based on ROI tool comparing Xeon 7500 and Xeon 5600 vs.. older generations.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to
vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
6
• Machine Check Architecture
(MCA) recovery (MCA-R)
• Machine Check Architecture
(MCA) recovery (MCA-R)
Memory
• Inter-socket Memory Mirroring
• Intel® Scalable Memory
Interconnect (Intel® SMI) Lane
Failover
• Intel® SMI Clock Fail Over
• Intel® SMI Packet Retry
• Memory Address Parity
• Failed DIMM Isolation
• Memory Board Hot Add/Remove
• Dynamic Memory Migration*
• OS Memory On-lining *
• Recovery from Single DRAM
Device Failure (SDDC) plus
random bit error
• Memory Thermal Throttling
• Demand and Patrol scrubbing
• Fail Over from Single DRAM
Device Failure (SDDC)
• Enhanced DRAM Double Device
Data Correction
• Fine Grained Memory Mirroring
• Memory DIMM and Rank Sparing
• Intra-socket Memory Mirroring
• Mirrored Memory Board Hot
Add/Remove
• Inter-socket Memory Mirroring
• Intel® Scalable Memory
Interconnect (Intel® SMI) Lane
Failover
• Intel® SMI Clock Fail Over
• Intel® SMI Packet Retry
• Memory Address Parity
• Failed DIMM Isolation
• Memory Board Hot Add/Remove
• Dynamic Memory Migration*
• OS Memory On-lining *
• Recovery from Single DRAM
Device Failure (SDDC) plus
random bit error
• Memory Thermal Throttling
• Demand and Patrol scrubbing
• Fail Over from Single DRAM
Device Failure (SDDC)
• Enhanced DRAM Double Device
Data Correction
• Fine Grained Memory Mirroring
• Memory DIMM and Rank Sparing
• Intra-socket Memory Mirroring
• Mirrored Memory Board Hot
Add/Remove
Advanced Reliability Starts With SiliconIntel® Xeon® processor E7 family RAS Capabilities
I/O Hub
• Physical IOH Hot Add
• OS IOH On-lining*
• PCI-E Hot Plug
• Physical IOH Hot Add
• OS IOH On-lining*
• PCI-E Hot Plug
CPU/Socket
• Machine Check Architecture (MCA)
recovery (MCA-R)
• Corrected Machine Check Interrupt
(CMCI)
• Corrupt Data Containment Mode
• Viral Mode
• OS Assisted Processor Socket
Migration*
• OS CPU on-lining *
• CPU Board Hot Add at QPI
• Electronically Isolated (Static)
Partitioning
• Single Core Disable for Fault
Resilient Boot
• Machine Check Architecture (MCA)
recovery (MCA-R)
• Corrected Machine Check Interrupt
(CMCI)
• Corrupt Data Containment Mode
• Viral Mode
• OS Assisted Processor Socket
Migration*
• OS CPU on-lining *
• CPU Board Hot Add at QPI
• Electronically Isolated (Static)
Partitioning
• Single Core Disable for Fault
Resilient Boot
Intel® QuickPath Interconnect
• Intel QPI Packet Retry
• Intel QPI Protocol Protection via
CRC (8bit or 16bit rolling)
• QPI Clock Fail Over
• QPI Self-Healing
• Intel QPI Packet Retry
• Intel QPI Protocol Protection via
CRC (8bit or 16bit rolling)
• QPI Clock Fail Over
• QPI Self-Healing
Advanced reliability features work to maintain data integrityAdvanced reliability features work to maintain data integrity
More Efficient
More Options
Higher performance
Lower platform power1
Optimized Turbo Boost
Intel Node Managerenhancements
More Intelligent
More SecureIntel AES-NI improvements
More robust Intel TXT solutions
Optimized platforms for:
� Performance
� Smaller Form Factors
� Best value
IntelIntel®®
XeonXeon®®
processor E5processor E5--2600 product family (Sandy Bridge2600 product family (Sandy Bridge--EP)EP)New micro-architecture on the 32nm process technology
Platform Features
Up to 8 Cores
Sandy Bridge-EP
QPI
Up to 8 cores, 20 MB cache
New Intel® Advanced Vector Extensions
Optimized Turbo Boost Technology
Up to2 QPIlinks
betweenCPUs
Integrated PCI Express* 3.0Up to 40 lanes per socket
Up to4 channelsDDR3 1600
memory
1 Lower platform power claim based on a Xeon® 5600 CPU and Sandy Bridge-EP CPU with the same TDP specification and comparable platform configurations. Platform power reduction is primarily attributed to TDP reduction from a two-chip solution based on the Intel 5520 chip set and ICH-10R, down to a one-chip south
bridge solution(Patsburg chip) on the Sandy Bridge platform.
8
INTEL: Breakthrough technologies for performance
1
2
34
5
6
7 1
2
34
5
6
7
1. Large memory support64-bit computing; System X with MAX5 supports up to 6TB on a single SMP box; Up to 640GB on each node of blade center.
7. Multi-core, multi-node environmentNehalem has 8 cores and Westmere 10 cores. This trend is expected to continue.
4. Virtualization PerformanceLower overhead: Core micro-architecture enhancements, EPT, VPID, and End-to-End HW assist
5. Hyperthreading2x logical processors; increases processor throughput and overall performance of threaded software.
3. Frequency PartitioningEnabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination.
2. Large on-chip CacheL1 cache 64KB per core, L2 cache is 256KB per core and L3 cache is about 24-30 MB. Additional Translation lookaside buffer (TLB).
6. Single Instruction Multiple DataSpecialized instructions for manipulating 128-bit data simultaneously.
Intel® Xeon® E7 Processor Architecture
9
Core 0Core 0 L1L1 L2L2
Core 1Core 1 L1L1 L2L2
Core 2Core 2 L1L1 L2L2
Core 3Core 3 L1L1 L2L2
Core 4Core 4 L1L1 L2L2
Core 5Core 5L2L2 L1L1
Core 6Core 6L2L2 L1L1
Core 7Core 7L2L2 L1L1
Core 8Core 8L2L2 L1L1
Core 9Core 9L2L2 L1L1
Shared L3Shared L3
IMCIMC QPI (4 Links)QPI (4 Links)
• 2 integrated memory controllers
• Scalable Memory Interconnect (SMI) with support for up to 8 DDR
channels
• 4 Quick Path Interconnect (QPI) system interconnect links
IMCIMC
Cache Architecture
•64K L1 Cache
•256K L2 Cache
•30MB 10 slice shared
Last Level cache (L3)
(compared to 24MB 8
slice L3 on Xeon® 7500)
Intel QuickPath Architecture
•Connectivity
– Fully-connected by 4 Intel® QuickPath
– interconnects per socket
– 6.4, 5.86, or 4.8 GT/s on all links
– With 2 IOHs: 82 PCIe lanes (72 Gen2 Boxboro lanes + 4 Gen1 lanes on unused ESI port + 6 Gen1 ICH10 lanes)
– PCE-E Gen 2.0
•Memory
– Registered DDR3 800/1066 MHz via on-board memory buffer
– 64 DIMM support (4:1 DIMM to buffer ratio)
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
MB
Intel® QuickPathinterconnects
BoxboroBoxboro BoxboroBoxboro
7500/E7 CPU7500/E7 CPU
7500/E7 CPU7500/E7 CPU
7500/E7 CPU7500/E7 CPU
7500/E7 CPU7500/E7 CPU
Intel® Xeon® 7500/E7 8 Socket Configuration
11
4+4 (8S)
Up to 10 cores and 2.4 Ghz
per CPU
Support 8 socket mode by
combining 2 systems via
external QPI links
Memory Configuration
� 4TB in 8 socket server
� 6TB in 8 socket + MAX5
� Continued 1066MHz
support
IBM® System
x3850 X5
Intel®: SIMD – Single Instruction Multiple Data technology
• The Intel Xeon® E7 processor supports up to SSE 4.2
• SIMD capabilities will be expanded to 256-bit registers with the new AVX
instruction set in the upcoming Intel® Xeon® E5 series processors
• Informix leverages SSE in the Warehouse Accelerator
Intel® Xeon® Processors: Virtualization Performance
Greater Greater Virtualization Virtualization
EfficiencyEfficiency: :
Intel QPIIntel QPI
DDR3 Memory DDR3 Memory
bandwidth and bandwidth and
capacitycapacity
IntelIntel®® VTVT
VTVT--xx
VTVT--dd
VTVT--cc
Virtualization Performance2
VMmark* Performance
1 Best published VMmark results as of 20 October 2010.
See legal information slide, speaker notes and backup foils (if needed) for notes and disclaimers.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured
using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and
performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
14
Third Generation of Database Technology
According to IDC’s Article (Carl Olofson) – Feb. 2010
1st Generation:
- Vendor proprietary databases of IMS, IDMS, Datacom
2nd Generation:
- RDBMS for Open Systems, dependent on disk layout, limitations in scalability and
disk I/O
- Database tuning by adding updating stats, creating/dropping indexes, data
partitioning, summary tables & cubes, force query plans, resource governing
3rd Generation: IDC Predicts that within 5 years:
• Most data warehouses will be stored in a columnar fashion
• Most OLTP database will either be augmented by an in-memory database (IMDB) or
reside entirely in memory
• Most large-scale database servers will achieve horizontal scalability through
clustering
15
Informix Database Server
Informix warehouse Accelerator
BI Applications
Step 1. Install, configure,start Informix
Step 2. Install, configure,start Accelerator
Step 3. Connect Studio to Informix & add accelerator
Step 4. Design, validate, Deploy Data mart
Step 5. Load data to accelerator
Ready for Queries
IBM Smart Analytics Studio
Step 1
Step 2
Step 3
Step 4
Step 5
Ready
Informix Warehouse Accelerator
16
Informix Warehouse Accelerator3rd Generation Database Technology is Here
How is it different?
• Performance: Unprecedented response
times to enable 'train of thought' analysis
frequently blocked by poor query
performance.
• Integration: Connects to IDS through deep
integration providing transparency to all
applications.
• Self-managed workloads: queries are
executed in the most efficient way
• Transparency: applications connected to
IDS, are entirely unaware of IWA
• Simplified administration: appliance-like
hands-free operations, eliminating many
database tuning tasks
What is it?
The Informix Warehouse Accelerator (IWA) is a
workload optimized, appliance-like, add-on, that enables
the integration of business insights into operational
processes to drive winning strategies. It accelerates
select queries, with unprecedented response times.
Breakthrough Technology Enabling New Opportunities
17
18
IWA Software Components
• Linux on Intel x86_64 (RHEL 5 or SUSE SLES 11)
• IDS 11.70 + IWA code modules including IDS Stored Procedures
– Linux on Intel (64 bit)
– AIX on Power (64 bit)
– HPUX on Itanium (64 bit)
– Solaris on Sparc (64bit)
• ISAO Studio Plug-in – GUI for Mart definition
• OnIWA – On Utilities for Monitoring IWA
19
INTEL/IWA: Breakthrough technologies for performance
1
2
34
5
6
7 1
2
34
5
6
7
1. Large memory support64-bit computing; System X with MAX5 supports up to 6TB on a single SMP box; Up to 640GB on each node of blade center. IWA: Compress large dataset and keep it in memory; totally avoid IO.
7. Multi-core, multi-node environmentNehalem has 8 cores and Westmere 10 cores. This trend is expected to continue. IWA: Parallelize the scan, join, group operations. Keep copies of dimensions to avoid cross-node synchronization.
4. Virtualization PerformanceLower overhead: Core micro-architecture enhancements, EPT, VPID, and End-to-End HW assist IWA: Helps informix and IWA to seemlessly run and perform in virtualized environment.
5. Hyperthreading2x logical processors; increases processor throughput and overall performance of threaded software. IWA: Does not exploit this since the software is written to avoid pipeline flushing.
3. Frequency PartitioningIWA: Enabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination.
2. Large on-chip CacheL1 cache 64KB per core, L2 cache is 256KB per core and L3 cache is about 4-12 MB. Additional Translation lookaside buffer (TLB).IWA: New algorithms to avoid pipeline flushing and cache hash tables in L2/L3 cache
6. Single Instruction Multiple DataSpecialized instructions for manipulating 128-bit data simultaneously. IWA: Compresses the data into deep columnar fashion optimized to exploit SIMD. Used in parallel predicate evaluation in scans.
Compressed dataIn memory
Worker
Memory image on disk
20
ApplicationsBI Tools
Step 1. Submit SQLDB protocol: SQLI or DRDANetwork : TCP/IP,SHM
Informix
2. Query matching and redirection technology
Step 3offload SQL.DRDA over TCP/IP
Step 4Results:DRDA over TCP/IP
Local Execution
Coordinator
Compressed dataIn memory
Worker
Memory image on disk
Compressed dataIn memory
Worker
Memory image on disk
Compressed dataIn memory
Worker
Memory image on disk
Step 5. Return results/describe/errorDatabase protocol: SQLI or DRDANetwork : TCP/IP, SHM
IWA: Multi-core and Multi-node environment
21
Step5: Send the results back to Infomrix server
Step1SQL from Informix
Coordinator
Compressed dataIn memory
Worker
Step3: Scan, Filter, join, group
Compressed dataIn memory
Worker
Step3: Scan, Filter, join, group
Compressed dataIn memory
Worker
Step3: Scan, Filter, join, group
Compressed dataIn memory
Worker
Step3: Scan, Filter, join, group
Step2Send the queries to all the workers
Step4: merge intermediate results, ORDER BY, FIRSTN
IWA: Multi-core and Multi-node environment
Compressed and Partitioned Data
QueryExecutor
core + $ (HT)core + $ (HT)
• Cell is also the unit of processing, each cell…
– Assigned to one core
– Has its own hash table in cache (so no shared object that needs latching!)
• Main operator: SCAN over compressed, main-memory table
– Do selections, GROUP BY, and aggregation as part of this SCAN
– Only need de-compress for aggregation
• Response time ∝∝∝∝ (database size) / (# cores x # nodes)
– Embarrassing Parallelism – little data exchange across nodes
DictionariesDictionaries
core + $ (HT)core + $ (HT)
core + $ (HT)core + $ (HT)
Cell
1
Cell
2
Cell
3
IWA: Multi-core and Multi-node environment
23
Expoloiting Larger Memory: Row Oriented Data StoreEach row stored sequentially
• Optimized for record I/O
• Fetch and decompress entire row, every time
• Result –
• Very efficient for transactional workloads
• Not always efficient for analytical workloads
If only few columns are required the complete row is still fetched and uncompressed
24
Expoloiting Larger Memory: Data is Processed in Compressed Format
• Within a Register – Store, several columns are grouped together.
• The sum of the width of the compressed columns doesn‘t exceed a register compatible width. This utilizes the full capabilities of a 64 bit system. It doesn‘t matter how many columns are placed within the register – wide data element.
• It is beneficial to place commonly used columns within the same register – wide data element. But this requires dynamic knowledge about the executed workload (runtime statistics).
• Having multiple columns within the same register – wide data element prevents ANDing of different results.
The Register – Store is an optimization of the Column – Store approach where we try to make the best use of existing hardware. Reshuffeling small data elements at runtime into a register is time consuming and can be avoided. The Register – Store also delivers good vectorization capabilities.
Predicate evaluation is done against compressed data!
25
Top 64 traded goods– 6 bit code
Rest
Prod Origin
Trade Info (volume, product, origin country)
CommonValues
Rare values
Nu
mb
er
of
Occ
urr
en
ces Histogram
on Origin
Histogram on Product
Origin
Pro
du
ct
ChinaUSA
GER,FRA,… Rest
Table partitioned into Cells
Column Partitions
Vol
Exploiting Large memory: Compression: Frequency Partitioning
• Field lengths vary between cells• Higher Frequencies Shorter Codes (Approximate Huffman)
• Field lengths fixed within cells
Cell 4Cell 1
Cell 2
Cell 3
Cell 5 Cell 6
26
IWA: SIMD: Register Stores Facilitate SIMD Parallelism
•Access only the banks referenced in the query (like a column store):
–SELECT SUM (T.G) –FROM T–WHERE T.A > 5–GROUP BY T.D
•Pack multiple rows from the same bank into the 128-bit register
• Enables yet another layer of parallelism: SIMD (Single-Instruction, Multiple-Data)!
A1 D1 G1
A2 D2 G2
A4 D4 G4
Bank ββββ1 (32 bits)
A3 D3 G3
B1 E1 F1
B2 E2 F2
B4 E4 F4
C1 H1
C3 H3
C4 H4
Bank ββββ2 (32 bits)Bank ββββ3 (16 bits)
Cell B
lock
B3 E3 F3
C2 H2
32 bits 32 bits32 bits32 bits
128 bitsResult1 Result2 Result3 Result4
Operand Operand Operand Operand
Vector Operation
27
IWA:SIMD: Simultaneous Evaluation of Equality Predicates
State==‘CA’ && Quarter == ‘Q4’
State==01001 && Quarter==1110
Translate value queryto Code query
Row
Mask
Selectionresult
… … … …
11111 0 1111 0
01001 0 1110 0
==
&
• CPU operates on 128-bit units
• Lots of fields fit in 128 bits
• These fields are at fixed offsets
• Apply predicates to all columns simultaneously!
State Quarter
•Encoding makes grouping simple!
–Coded values assigned densely (by construction) –Hence, in principle, grouping is simple: aggTable[group] += aggValue
•Challenges:
–Fitting hash table in L2 cache–Avoiding all branches in hash table lookup
•IWA adaptively uses one of 2 techniques, depending on # of distinct groups
1.Use dictionary code as a perfect hash (i.e. collision-free), OR
•aggTable[groupCode] += aggValue
•No branches, no hash function computation
•Works great if groupCode is dense – i.e., single column, or multiple column with little correlation
2.Use usual linear probing
•Involves branches, random access, …
Exploiting Large on-chip Cache
29
Case Study #1: U.S. Government Agency
30
• Microstrategy report was run, which generates
• 667 SQL statements of which 537 were Select statements
• Datamart for this report has 250 Tables and 30 GB Data size
• Original report on XPS and Sun Sparc M9000 took 90 mins
• With IDS 11.7 on Linux Intel box, it took 40 mins
• With IWA, it took 67 seconds.
Case Study #2: Datamart at a Government Agency
31
Case Study #3: Skechers, USA. Shoe Retailer • Top 7 time-consuming queries in Retail BI and Warehouse:
(Against 1 Billion rows Fact Tables)
Query IDS 11.5 IDS 11.7 IWA
1 22 mins 4 secs
2 1 min 3 secs 2 secs
3 3 mins 40 secs 2 secs
4 30 mins & up 4 secs
5 2 mins 2 secs
6 30 mins 2 secs
7 45 mins & up 2 secs
Query acceleration 30x to 1400x – average acceleration 450x
Systems Tested
• 4S Intel® Xeon® 7560 (whitebox)
– 2.26 GHz 8C CPU
• 4S Intel® Xeon® E7 4870 (whitebox)
– 2.40 GHz 10C CPU
– 256GB 1066GHz DDR3 memory
• 8S Intel® Xeon® E7 7560 (IBM® System x3850 X5)
– 2.26 GHz 8C CPU
– 2TB 1066GHz DDR3 memory
32
POPS schema
33
daily_sales
daily_forecast
Customer
Store
Period
Product
Promotion
350 million rows
1 billion rows
Systems Tested
• 8S Intel® Xeon® E7 7560 (IBM® System x3850 X5)
– 2.26 GHz 8C CPU
– 2TB 1066GHz DDR3 memory
37
Store Sales ER-Diagram500 GB SSED
4,594,771,672
20
73,049
1,920,800
1000
204,000
1,000,000
402
86,400
7200
2,000,000
8936.42511ArithMean792628891379240.251368151137245413926951383661
3725.9501075307411539333651656935363204142444.5139664142908145702141504
7255.5320772034469.33320405152037336202555728040.2524197273912958230991
1360.2567892058833.667206086220650492050590151356.25151100149153152408152764
26663.9023521735082181312217943521597778151.58115805482078230
9088.18055529534252669185307202531190658266.2558522554365934159766
1879.3563541627433.33316060291656098162017386595.2591692838248571185154
4396.1847762269596.33322696772264463227464951626.552788556605041547643
3111.80971561521.66715680831570163154631950180.549932513745235447062
28077.092142525253.66725225852529234252394289949083872491329037
9162.6879235310510.6675325095531050752959305795860119564375831556961
4841.715267227059522819462278167225167246896.547031465734658247400
50383.73908220826892190310522354449219905134382943215458374244143823
3278.4130351545935.6671563874154740415265294715543668500324647048450
3583.7057371560157.33315449121557525157803543534.7544918444634641238346
4945.7900375985222594752560446265963515121016.5117572123593123030119871
8024.2622122310265.3332517291221154922019562879129362300832807527644
5454.1569551526850.33315287241526089152573827994.2529846246022930128228
9146.4635373163944.6673150876317365631673023459231651355793355137587
1462.6236351726097169040017227461765145118013.75117902117513117053119587
2004.222772189849018999161884782191077294724.592691956389719293377
5460.128295153851415389591538364153821928177.2527417269272717531190
3295.1791043324926.333334187333383523294554100902.759766692653104246109046
ImprovementIDS AVGIDS3IDS2IDS1IWA AVGIWA4IWA3IWA2
41
Thank You!Your Feedback is Important to Us
• Access your personal session survey list and complete via SmartSite– Your smart phone or web browser at: iodsmartsite.com
– Any SmartSite kiosk onsite
– Each completed session survey increases your chance to win
an Apple iPod Touch with daily drawing sponsored by Alliance
Tech
Session Number 2864
Thank you!