VMware vFabric gemfire for high performance, resilient distributed apps

Post on 19-Nov-2014

1,640 views 1 download

Tags:

description

Learn how VMware vFabric GemFire helps build

transcript

© 2010 VMware Inc. All rights reserved

Building High Performance, Data Intensive Resilient, Distributed Applications

2

The ChallengeData Explosion

Decision Time Compression

Critical to act time frame that matters•Milliseconds•Seconds•Minutes or even hours

3

Today’s Modern Architecture’s

Web

Application

Database Tier

Storage Tier

Stateless

Stateful

Load Balancer

Web Applications

4

Today’s Modern Architecture’s

Data Ingest Applications

5

Agenda - Challenges and Solutions1.High Performance & Data Intensive

Latency Scale

2.Resilient Reliability Availability

3.Solutions

6

Sources of LatencyDisk access

Serial access

Network time

Sockets (open/close)

Marshalling and unmarshalling

Security overhead

7

Sources of Latency - Disk Access

Mitre Public Release: 10-0861. Distribution Unlimited

8

Network time

Keep sockets open to all members Helps with security performance

Minimize network hops

Push computing to data

9

Marshalling/Unmarshalling

Lazy deserialization

Index serialized data

Shared compact data format

10

Security overhead

Mutual authentication at socket time

Process/user level (optional)

11

Agenda - Challenges and Solutions1.High Performance & Data Intensive

Latency Scale

2.Resilient Reliability Availability

3.Solutions

12

Architecting Infinitely Scalable Systems

http://blogs.msdn.com/b/pathelland/

A seminal paper on the architecture of elastic applications by Pat Helland (Tandem Computing, Amazon.com, Microsoft)

“Life Beyond Distributed Transactions: an Apostate’s Opinion”http://www.cidrdb.org/cidr2007/papers/cidr07p15.pdf

Application architectures need to change to achieveinfinite scalability and elasticity without using large hardware

13

Scale Aware Code

Scale - Layered Code

Programming Abstraction

Scale Agnostic Code

Bottom layer understands application is distributed

Top layer

Abstraction layer

Common layered architecture in largest scale applications

14

Scale

Shared nothing

Partition/Sharding

Collocated relations

Replicated reference

14

15

Agenda - Challenges and Solutions1.High Performance & Data Intensive

Latency Scale

2.Resilient Reliability Availability

3.Solutions

16

Reliability

No data loss No data corruption Consistency

Race condition Synchronous vs Asynchronous

17

No Data Loss, No Data corruption, Consistency

Distributed semaphore - lightweight

Primary copy

Distributed transaction(s) – heavy weight

MVCC – Acronyms are annoying

18

Race Conditions, Consistency

Application

Data Tier

Stateless ?

Stateful

Eventually Consistent

Controllably Consistent

19

Agenda - Challenges and Solutions1.High Performance & Data Intensive

Latency Scale

2.Resilient Reliability Availability

3.Solutions

20

Availability - on Server

Protect data• Extra copies• Disk?

Data Center crashes

Network Splits• Split Brain detection

21

Availability – Between Client/Server

Slow Consumers• HA Queues

Client Network drops• Durable subscribers

22

Agenda - Challenges and Solutions1.High Performance & Data Intensive

Latency Scale

2.Resilient Reliability Availability

3.Solutions

23

Latency & Reliability - Memory-based Performance

Perform Memory on a peer machine to make data updates durable, Writes return 10x to 100x faster than disk, 10s to 100s of Microseconds vs 10s to 100s Milliseconds

Keep redundant copies of dataUpdate thru primary0 data lossOptionally write updates to disk, Optional write to data warehouse asynchronously and reliably.

Customers

Orders

Product

Protect

24

Memory-based Performance

Perform In Situ data processingReal-time controls

Calculate: current total fuel left

25

Latency - Data-Aware Access

Perform Application Client

Java, C++, .Net, SQL

26

Latency & Reliability - HA Data-Aware Function

Execute

Move behavior to data

ClientData Aware Function

27

Parallel Queries

Client

Scatter-Gather Queries & Functions

Compute

28

Data Distribution

Distribute

Keep clusters synchronized in real-time Operate reliably Disconnected, Intermittent and Low-Bandwidth network environments.

29

Distributed Events

Targeted, Guaranteed delivery. Event notification &Continuous Queries

Notify

Disconnected, Intermittent and Low-Bandwidth network environments

30

Cloud Ready

Web TierApplication Tier

Load Balancer

Optional reliable, asynchronous feed to Data Warehouse or Archival Database

GemFire Jar 11MB (or less)

Soar

31

Thank you