Modernización del manejo de datos con v fabric

transcript

#SGvFabric

vSphere 5

vFabric: What’s in it?

Application Services vFabric

Frameworks & Tools Rich Web

Social and Mobile

Data Access

Integration Patterns

Batch Framework

Spring Tool Suite

Perf, Mgmt Hyperic / Insight

Application Srv tc Server

Web Runtime ERS

Elastic Data Grid Gemfire / SQLFire

Messaging RabbitMQ

EM4J Data Director

vCops/APM

DBaaS vPostgres

Cloud-scale challenge…

Challenge

Managing on-line applications on a cloud-scale is hard. As number of users grows, database becomes the bottleneck.

DB Bottleneck

Scales…

Traditional databases were never designed to support thousands of concurrent users.

Traditional DB Characteristics

§ Designed against no longer relevant constraints • Network unreliable/slow • RAM prices prohibitive

§ One size fits all • Designed for everything, optimized for nothing • Often incompatible with modern workloads

§ Centralized in nature • Data change capture an

afterthought • Lacks data partitioning

facilities

§ Obsessed with ACID • Constant contention for

resources cause locks

§ Monolithic design §  Requires lots of hardware to

Traditional DB Loves IO

First write to LOG

Second write to Data files

Buffers primarily tuned for IO

30% Data Btrees keys Logging Locking Latching Buffer management

Source: Research by MIT and Brown: “OLTP Under the Looking Glass” by S. Harizopoulos, D. J. Abadi, S. Madden, M. Stonebraker, SIGMOD 2008.

Percentage of Computer cycles based on 3.5M

sample

Transaction in Traditional DB

Cloud-scale solution…

Apparent Choices

Build expensive database clustering solution or lengthy re-write for “big data”?

Next generation option

SQLFire is different; it’s build for speed and scale.

Scale much?

Hablo SQL?

New Approach

Elastic, in-memory database designed specifically for speed and low latency accessible through a familiar SQL interface.

SQLFire Characteristics

§ Highly concurrent data structures resident in and optimized for main memory

§ Rethink ACID transactions; all state resides in distributed memory to avoid any single points of contention

§ Partition-aware DB design spreads workloads across both data set and physical nodes

§ Shared nothing logs on disk; application writes are never exposed to the disk seek latencies

§ Parallelize data access and application behavior; dynamically “shard SQL”

§ Dynamic rebalancing of data as cluster size grows/shrinks. Most efficient way of managing resources/data.

SQLFire speed…

SQLFire v Traditional Databases

SQLFire response times are faster and more consistent under increased database load.

Sample Comparison

§  Spring Travel Application §  Similar hardware (8 vCPU, 4GB) § Out-of-the-box configuration

SQLF R/T (ms) SQLF CPU % MySQL R/T (ms) MYSQL CPU %

14 9 25 1

8 32 23 19

5 61 172 76

6 77 fail fail

984 98 fail fail

Response Time

0 500 1000 1500 2000

MySQL increased with load

SQLFire near constant much lower

Threads

Number of Threads

0 1000 2000 3000 4000 5000 6000 7000 8000

MySQL reaches saturation

at 1850 threads

SQLFire scales to 7200 threads

with 1 second R/T

Threads

Distributed data…

Why Scale Horizontally?

Sub-divide system into independent data sets, eliminate distributed transactions to achieve elasticity, linear scalability and predictable latency.

Horizontal Scalability – Throughput

100000

200000

300000

400000

500000

600000

700000

800000

2 4 6 8 10

Number of servers

queriesPerSecond

client threads

Horizontal Scalability – Consistency/HA

§  Resiliency through replication, synchronous but in parallel

§  Row updates are always atomic; no need for transactions

§  Shared nothing architecture, including storage §  Instant failover at protocol level § Apps retain their connections

§ Data remains available

SQLFire

SQLFire SQLFire

Data management strategies…

Data strategies – Partitioning

§  Balances data across SQLFire cluster

§  Delivers redundancy for high availability

SQLFire

SQLFire SQLFire

Write operation (with 2 redundant copies)

Read operation

SQLFire Hash Partitioning

§  Partition by column or primary key • Can specify multiple columns

• Uses hashCode() for single column or primary key • Uses serialized bytes for multiple columns

• Creates uniform distribution of data across the cluster

// Partition by column CREATE TABLE MY_TABLE ( . . . ) PARTITION BY COLUMN ( COLUMN_A) // Partition by primary key CREATE TABLE MY_TABLE ( . . . ) PARTITION BY PRIMARY KEY

SQLFire Range Partitioning

§  Partition by range of column values • Can specify multiple ranges

• Colocates data in specified ranges • Used to ensure locality of data in a partition for range queries or cross table

// Partition by range CREATE TABLE MY_TABLE ( . . . ) PARTITION BY RANGE ( COLUMN_A) ( VALUES BETWEEN 1 AND 10, VALUES BETWEEN 50 AND 60 )

SQLFire List Partitioning

§  Partition by a set of column values • Can specify column value sets

• Colocates data with specified column values • Used to ensure locality of data in a partition for sets of values or cross table

// Partition by list CREATE TABLE MY_TABLE ( . . . ) PARTITION BY LIST ( COLUMN_A) ( VALUES (‘VALUE_A’, ‘VALUE_B’), VALUES (‘VALUE_Y’, ‘VALUE_Z’) )

SQLFire Expression Partitioning

§  Partition by a column expression •  Expression must be valid SQL function

• Must reference only columns in the table • Hash partition with value determined by the expression

// Partition by expression CREATE TABLE MY_TABLE ( . . . ) PARTITION BY ( MONTH ( MY_DATE ) )

SQLFire Default Partitioning

§  Default hash partitioning strategy •  Start server with table-default-partitioned property set to true!

•  First foreign key whose referenced primary key is also a partition column •  Primary key •  First unique key •  SQLFire-generated row id

// No PARTITION BY clauses CREATE TABLE MY_TABLE (COLUMN_A INT NOT NULL CONSTRAINT A_PK PRIMARY KEY, . . .) CREATE TABLE MY_OTHER_TABLE (COLUMN_B INT NOT NULL CONSTRAINT B_PK PRIMARY KEY, COLUMN_C INT CONSTRAINT A_FK REFERENCES MY_TABLE (COLUMN_A), . . .)

Data strategies – Replication

§  Copies all data across SQLFire cluster

§  Appropriate for reference data

SQLFire

SQLFire SQLFire

Write operation (with replicated copies)

Read operation

SQLFire Replicated Tables

§  Created by default with no PARTITION BY clause

§  Created with REPLICATE clause

§  Reference data or fact tables are good candidates

§  Replicates data across all peers in server group

§  Replication is parallel and synchronous

§  Automatic replication failure detection

// Replication example CREATE TABLE MY_TABLE ( . . . ) REPLICATE

Topology strategies…

Topology

Client-server JVM JVM

JVM JVM

APP APP

SQLFire SQLFire SQLFire

SQLFire Locator

Topology

Embedded Peer-to-peer

JVM JVM JVM

SQLFire

Synchronization strategies…

Synchronous strategy

In data-center or over private network

JVM JVM

JVM JVM JVM

APP APP

SQLFire Locator

JVM JVM

JVM JVM JVM

APP APP

SQLFire Locator

Site 1 Site 2 Redundancy Zone A Redundancy Zone B

Asynchronous strategy

Multi-site over the Cloud

JVM JVM

JVM JVM JVM

APP APP

SQLFire Locator

JVM JVM

JVM JVM JVM

APP APP

SQLFire Locator

Site 1 Site 2

WAN Gateway

Data strategies – Server Groups

Group 1

Group 2

SQLFire Cluster

SQLFire

Group 3

data demo…

Summary…

Why SQLFire?

In-memory, delivers maximum speed and minimum latency Horizontally scalable, easily adopts to changing workloads, usage patterns Familiar SQL interface, accessible from Java and .NET

If you forgot everything else…

SQLFire is better in supporting on-line applications than traditional databases.

Sample Apps §  Side-by-side comparison of SQLFire v MySQL

performance - https://github.com/vFabric/sqlf-demo

§  Demo call-center application, SQLFire configuration scripts https://github.com/vFabric/sqlf-cloud

Demo Video §  Real-life performance comparison (YouTube, 3 min.)

http://youtu.be/HV-broQHJlk

SQLFire Artifacts

http://vmware.com/go/sqlfire

@vFabricSQLFire, @_cmc

The end

Modernización del manejo de datos con v fabric

Technology