+ All Categories
Home > Software > Distributed architecture of oracle database in memory

Distributed architecture of oracle database in memory

Date post: 15-Apr-2017
Category:
Upload: suresh-kumar-mukhiya
View: 315 times
Download: 1 times
Share this document with a friend
25
Distributed Architecture of Oracle Database In-memory Håkon Åmdal Suresh Kumar Mukhiya Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Atrayee Mullick, Andy Witkowski, Jiaqi Yan, Mohamed Zait 1/ 25
Transcript

Distributed Architecture of Oracle Database In-

memory

Håkon ÅmdalSuresh Kumar Mukhiya

Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin, Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Atrayee Mullick, Andy Witkowski, Jiaqi Yan, Mohamed Zait

1/25

MotivationAd-hoc real-time analysis (OLAP)

High performance

Large amount of data

All without explicit optimizer plan changes or query rewrites

… while still keeping the system suitable for transactional workloads (OLTP)

2/25

Oracle Database In-Memory Dual Format

Row storage for online transactional processing (OLTP)

Column storage for online analytical processing (OLAP)

Database objects optimized for memory, while still being persisted on disk.

Memory is cheap

From cache on disk-access to primary storage.

3/25

Need for a distributed architectureScaling out

Scaling out allows for elastic expansion

Avoid single point of failure

Hard to program?

Scaling up

Majority of analytics do not process huge datasets at the same time

Cheap hardware on a single server can process 90% of Facebook’s jobs.

Main memory bus might become bottleneck

Simpler implementation?

Single point of failure

Long recovery period“We are motivated by these observations to design an extremely scalable, high-available fault-tolerant distributed architecture within the Oracle Database In-Memory Option”

4/25

Distributed Oracle DBIMOracle Real Application Cluster

(RAC) allows for scaling ut. across multiple machines.

“Shared nothing”-architecture

Persisted in row-based blocks

5/25

In-Memory compression units (IMCUs)

● Compressed with user-defined compression levels

○ Optimized for OLTP○ Optimized for OLAP○ Optimized for Storage

● High performance○ Single Input Multiple Data

(SIMD) instructions○ In-Memory Storage indexes○ Bloom filter based joins

6/25

Shared Database Buffer Cache

Global Cache Service (GCS) tracks and maintains locations and access modes of all data blocks in the global cache.

Handle all OLTP operations

Guarantees strict ACID and robustness properties

Cache Fusion Protocol

7/25

In-memory column store

Shared-nothing container of in-memory segments on each instance

Distributed together with the underlying row blocks

Falls back to disk storage if not present in-memory 8/25

In-memory Transaction manager

Each server is responsible for transactional consistency for incoming DML statements

When looking up the transactional log causes too much overhead, the IMCUs are rebuilt

9/25

a. extremely scalable application-transparent distribution of IMCUs across a RAC cluster allowing for efficient utilization of collective memory across in-memory column stores

b. high availability of IMCUs across the cluster guaranteeing in-memory fault-tolerance for queries

c. application-transparent distribution of IMCUs across NUMA nodes within a single server to improve vertical scale-up performance

d. efficient recovery against instance failures by guaranteeing minimal rebalancing of IMCUs on cluster topology changes

e. seamless interaction with Oracle’s SQL execution engine ensuring affinitized high performance parallel scan execution at local memory bandwidths, without explicit optimizer plan changes.

Distribution manager

10/25

Distribution scheme

Partition

Subpartition

Block range

Auto

11/25

Distribution mechanism

Global phase for distribution consensus

Decentralized population phase

Each instance comes up with the same object location using a hashing algorithm

Spread equally across a NUMA node 12/

25

Redistribution

If a server goes down, a new distribution is calculated

Only the objects that has been redistributed is moved, the other ones stay the same.

13/25

Availability

None

1-safe

(N-1)-safe

14/25

Distributed SQL Execution

15/25

Uniqueness of ArchitecturePaper Compares uniqueness with: SAP HANA and IBM DB2 with BLU

with respect toDistribution

Scalability

Availability

Recovery

The architecture provides complete scale out solution with collective memory utilization, redundancy, availability and efficient failure handling by redistribution. 16/

25

EvaluationHardware Setup

Distribution Experiments

Distributed Query Execution

In-Memory Distribution Awareness

In-Memory fault Tolerance

NUMA Aware Query Execution

Evaluation to Validate Quality Attributes like:

● Performance ● Scalability ● Availability

17/25

Evaluation- Hardware Setup

Conducted on Oracle Exadata Database Machine, a state of the art database SMP Server and storage cluster system

NUMA Experiment is conducted on an X4-8 single node machinewith 8 15-core Intel Xeon processor and 2TB DRAM

Rest of the experiments are conducted on X4-2 RAC Configurationcomprising up to 8 database server nodes, each with 2 12-core Intel Xeon processor

and 256GB DRAM and 14 shared storage servers amounting to 200TB total storage capacity

18/25

Evaluation- Distribution Experiments

To verify whether IMCU throughput scales out with the number of database server instances in the RAC Cluster.

Two experiments were performed:Non-partitioned Table Distribution

13-column and 1 billion row non-partitioned atomic table with size of 64 GB

Composite-partitioned Table Distribution

TPC-H lineitem schema is chosen for this experiment.

19/25

Evaluation- Distributed Query Execution

- set of 3 experiments performed in 13-column 64 gb atomics table

Table is auto-distributed based on block ranges without redundancy

Four sets of query sets are selected for each of these experiments

20/25

Evaluation- Distributed Query ExecutionQ1, Q2, Q3 non-linear

scale out where queries are CPU-Bound

Queries in set 4 exercise in-memory storages. Throughput of such queries is not expected to scale with number of instances.

21/25

Evaluation - In Memory Distribution AwarenessTo observe and validate impact of

in-memory distribution awareness in execution of cluster-wide analytic query performance.

Performance gains in orders of 20x to 40x over executions with distribution awareness disabled in parallel query granule generation phase.

22/25

Evaluation - In Memory Fault ToleranceTo validate in-memory fault

tolerance of distributed query execution under 1-safe redundancy.

Single instance failure has no visible effect on the elapsed times of the queries.

23/25

Evaluation - NUMA Aware query ExecutionTo observe weather IMCU

NUMA-affined query execution yields better throughput compared to the same in-memory execution but without NUMA awareness.

150-250% improvements in query elapsed times when execution in IMCU NUMA-aware. 24/

25

ConclusionThis paper summarizes:

Motivation for development of in-memory database optimized for OLTAP environment

Fault-tolerant distributed architecture of Oracle database in-memory option

Uniqueness of the architecture among its peers

Evaluation with sets of experiments showing how performance can be enhanced with in-memory option

25/25


Recommended