+ All Categories
Home > Technology > An engineer's introduction to in-memory data grid development

An engineer's introduction to in-memory data grid development

Date post: 13-May-2015
Category:
Upload: oracleimc-isv-migration-center
View: 2,276 times
Download: 0 times
Share this document with a friend
Popular Tags:
98
<Insert Picture Here> An engineer’s introduction to in-memory data grid development Andrei Cristian Niculae Technology Presales Consultant
Transcript
Page 1: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

An engineer’s introduction to in-memory data grid development

Andrei Cristian Niculae

Technology Presales Consultant

Page 2: An engineer's introduction to in-memory data grid development

Agenda

• What Is Coherence

– Distributed Data Grid

• How Does It Work?

• Use Cases

• Conclusion

• Q&A

Page 3: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

Distributed Data Grid

Page 4: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

“A Distributed Data Grid is a system composed of multiple servers that work together to manage

information and related operations - such as computations - in a distributed environment.”

Cameron Purdy

VP of Development, Oracle

Page 5: An engineer's introduction to in-memory data grid development

Scalability Chasm

Application Servers

Web

Servers

Data Demand

Ever Expanding Universe of Users

Data Supply

• Data Demand outpacing Data Supply

• Rate of growth outpacing ability to cost effectively scale applications

Page 6: An engineer's introduction to in-memory data grid development

Performance Problem

A Performance Bottleneck

ApplicationDatabase Tables

Object

JavaSQL server

Relational

• Volume

• Complexity

• Frequency of Data Access

Page 7: An engineer's introduction to in-memory data grid development

Oracle Coherence as Data Broker

Application Servers

Web

Servers

Data Demand

Ever Expanding Universe of Users

Data Supply

• Oracle Coherence brokers Data Supply with Data Demand

• Scale out Data Grid in middle tier using commodity hardware

Data Sources

Objects

Page 8: An engineer's introduction to in-memory data grid development

Coherence In-Memory Data GridLinear scaling, extreme performance, unparalleled reliability

• Memory spans multiple

machines

• Add/remove nodes

dynamically

• Scale linearly to

thousands

• Reliability through

redundancy

• Performance through

parallelization

• Integration through

shared memory grid

CoherenceCoherence

Page 9: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

How does it work

Page 10: An engineer's introduction to in-memory data grid development

Oracle Coherence: A Unique Approach

• Dynamically scale-out duringoperation

• Data automatically load-balanced to

new servers in the cluster

• No repartitioning required

• No reconfiguration required

• No interruption to service during scale-out

• Scale capacity and processing on-the-fly

Page 11: An engineer's introduction to in-memory data grid development

Oracle Coherence: How it Works

• Data is automatically partitioned and load-balanced across the Server Cluster

• Data is synchronously replicated for continuous availability

• Servers monitor the health of each other

• When in doubt, servers work together to diagnose status

• Healthy servers assume responsibility for failed server (in parallel)

• Continuous Operation: No interruption to service or data loss due to a server failure

Page 12: An engineer's introduction to in-memory data grid development

Oracle Coherence: Summary

• Peer-to-Peer Clustering and Data Management Technology

• No Single Points of Failure

• No Single Points of Bottleneck

• No Masters / Slaves / Registries etc

• All members have responsibility to;

• Manage Cluster Health & Data

• Perform Processing and Queries

• Work as a “team” in parallel

• Communication is point-to-point (not TCP/IP) and/or one-to-many

• Scale to limit of the back-plane

• Use with commodity infrastructure

• Linearly Scalable By Design

Page 13: An engineer's introduction to in-memory data grid development

15

Coherence Topology Example

Distributed Topology

• Data spread and backed up across Members

• Transparent to developer

• Members have access to all Data

• All Data locations are known – no lookup & no registry!

Page 14: An engineer's introduction to in-memory data grid development

16

Coherence Update Example

Distributed Topology

• Synchronous Update

• Avoids potential Data Loss & Corruption

• Predictable Performance

• Backup Partitions are partitioned away from Primaries for resilience

• No engineering requirement to setup Primaries or Backups

• Automatically and Dynamically Managed

Page 15: An engineer's introduction to in-memory data grid development

17

Coherence Recovery Example

Distributed Topology

• No in-flight operations lost

• Some latencies (due to higher priority of recovery)

• Reliability, Availability, Scalability, Performance are the priority

• Degrade performance of some requests

Page 16: An engineer's introduction to in-memory data grid development

CoherenceDistributed data management for applications

• Development Library

– Pure Java 1.4.2+

– Pure .Net 1.1 and 2.0 (client)

– C++ client (3.4)

– No Third-Party Dependencies

– No Open Source Dependencies

– Proprietary Network Stack (Peer-To-Peer model)

• Other Libraries Support…

– Database and File System Integration

– TopLink and Hibernate

– Http Session Management

– WebLogic Portal Caches

– Spring, Groovy

Page 17: An engineer's introduction to in-memory data grid development

The Portable Object Format

Advanced Serialization

• Simple Serialization Comparison

– In XML

• <date format=“java.util.Date”>2008-07-03</date>

• 47 characters (possibly 94 bytes depending on encoding)

– In Java (as a raw long)

• 64 bits = 8 bytes

– In Java (java.util.Date using ObjectOutputStream)

• 46 bytes

– In ExternalizableLite (as a raw long)

• 8 bytes

– In POF (Portable Object Format)

• 4F 58 1F 70 6C = 5 bytes

(c) Copyright 2008. Oracle Corporation

Page 18: An engineer's introduction to in-memory data grid development

©2011 Oracle Corporation 23

Coherence Cache Types / Strategies

Replicated

Cache

Optimistic

Cache

Partitioned

Cache

Near Cache

backed by

partitioned

cache

LocalCache not

clustered

Topology Replicated Replicated Partitioned Cache Local Caches +

Partitioned Cache

Local Cache

Read Performance Instant Instant Locally cached:

instant --Remote:

network speed

Locally cached:

instant --Remote:

network speed

Instant

Fault Tolerance Extremely High Extremely High Configurable Zero to

Extremely High

Configurable 4 Zero

to Extremely High

Zero

Write Performance Fast Fast Extremely fast Extremely fast Instant

Memory Usage

(Per JVM)

DataSize DataSize DataSize/JVMs x

Redundancy

LocalCache +

[DataSize / JVMs]

DataSize

Coherency fully coherent fully coherent fully coherent fully coherent n/a

Memory Usage

(Total)

JVMs x DataSize JVMs x DataSize Redundancy x

DataSize

[Redundancy x

DataSize] + [JVMs x

LocalCache]

n/a

Locking fully transactional none fully transactional fully transactional fully transactional

Typical Uses Metadata n/a (see Near

Cache)

Read-write caches Read-heavy caches

w/ access affinity

Local data

Page 19: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

Data Management Options

Page 20: An engineer's introduction to in-memory data grid development

Data Management

Partitioned Caching

• Extreme Scalability: Automatically, dynamically and transparently partitions the data set across the members of the grid.

• Pros:– Linear scalability of data capacity

– Processing power scales with data capacity.

– Fixed cost per data access

• Cons:– Cost Per Access: High percentage

chance that each data access will go across the wire.

• Primary Use:• Large in-memory storage

environments

• Parallel processing environments

Page 21: An engineer's introduction to in-memory data grid development

Data Management

Partitioned Fault Tolerance

• Automatically, dynamically and transparently manages the fault tolerance of your data.

• Backups are guaranteed to be on a separate physical machine as the primary.

• Backup responsibilities for one node’s data is shared amongst the other nodes in the grid.

Page 22: An engineer's introduction to in-memory data grid development

Data Management

Cache Client/Cache Server

• Partitioning can be controlled on a member by member basis.

• A member is either responsible for an equal partition of the data or not (“storage enabled” vs. “storage disabled”)

• Cache Client – typically the application instances

• Cache Servers –typically stand-alone JVMs responsible for storage and data processing only.

Page 23: An engineer's introduction to in-memory data grid development

Data Management

Near Caching

• Extreme Scalability and Extreme Performance: The best of both worlds between the Replicated and Partitioned topologies. Most recently/frequently used data is stored locally.

• Pros:– All of the same Pros as the

Partitioned topology plus…– High percentage chance data is

local to request.

• Cons:– Cost Per Update: There is a cost

associated with each update to a piece of data that is stored locally on other nodes.

• Primary Use:– Large in-memory storage

environments with likelihood of repetitive data access.

Page 24: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

Caching Patterns

and Terminology

Page 25: An engineer's introduction to in-memory data grid development

Cache Aside

• Cache Aside pattern means the developer manages the

contents of the cache manually

– Check the cache before reading from data source

– Put data into cache after reading from data source

– Evict or update cache when updating data source

• Cache Aside can be written as a core part of the

application logic, or it can be a cross cutting concern (using AOP) if the data

access pattern is generic

Page 26: An engineer's introduction to in-memory data grid development

Read Through / Write Through

• Read Through

– All data reads occur through cache

– If there is a cache miss, the cache will load the data from the

data source automatically

• Write Through

– All data writes occur through cache

– Updates to the cache are written synchronously to the data

source

Page 27: An engineer's introduction to in-memory data grid development

Write Behind

• All data writes occur through cache

– Just like Write Through

• Updates to the cache are written asynchronously to

the data source

Page 28: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

How it Works

Page 29: An engineer's introduction to in-memory data grid development

CacheStore

•CacheLoader and CacheStore (normally referred

to generically as a CacheStore) are generic

interfaces

• There are no restrictions to the type of data store that

can be used

– Databases

– Web services

– Mainframes

– File systems

– Remote clusters

– …even another Coherence cluster!

Page 30: An engineer's introduction to in-memory data grid development

Read Through

• Every key in a Coherence cache is assigned (hashed)

to a specific cache server

• If a get results in a cache miss, the cache entry will be

loaded from the data store via the cache server through the provided CacheLoader implementation

• Multiple simultaneous requests for the same key will

result in a single load request to the data store

– This is a key advantage over Cache Aside

Page 31: An engineer's introduction to in-memory data grid development

Read Through

Page 32: An engineer's introduction to in-memory data grid development

Read Through

Page 33: An engineer's introduction to in-memory data grid development

Read Through

Page 34: An engineer's introduction to in-memory data grid development

Read Through

Page 35: An engineer's introduction to in-memory data grid development

Read Through

Page 36: An engineer's introduction to in-memory data grid development

Read Through

Page 37: An engineer's introduction to in-memory data grid development

Read Through

Page 38: An engineer's introduction to in-memory data grid development

Read Through

Page 39: An engineer's introduction to in-memory data grid development

Refresh Ahead

• Coherence can use a CacheLoader to fetch data

asynchronously from the data store before expiry

• This means that no client threads are blocked while

the data is refreshed from the data store

• This mechanism is triggered by a cache get - items

that are not accessed will be allowed to expire, thus

requiring a read through for the next access

Page 40: An engineer's introduction to in-memory data grid development

Refresh Ahead

Page 41: An engineer's introduction to in-memory data grid development

Refresh Ahead

Page 42: An engineer's introduction to in-memory data grid development

Refresh Ahead

Page 43: An engineer's introduction to in-memory data grid development

Refresh Ahead

Page 44: An engineer's introduction to in-memory data grid development

Refresh Ahead

Page 45: An engineer's introduction to in-memory data grid development

Refresh Ahead

Page 46: An engineer's introduction to in-memory data grid development

Refresh Ahead

Page 47: An engineer's introduction to in-memory data grid development

Refresh Ahead

Page 48: An engineer's introduction to in-memory data grid development

Write Through

• Updates to the cache will be synchronously written to

the data store via the cache server responsible for the given key through the provided CacheStore

• Advantages

– Write to cache can be rolled back upon data store error

• Disadvantages

– Write performance can only scale as far as the database will

allow

Page 49: An engineer's introduction to in-memory data grid development

Write Through

Page 50: An engineer's introduction to in-memory data grid development

Write Through

Page 51: An engineer's introduction to in-memory data grid development

Write Through

Page 52: An engineer's introduction to in-memory data grid development

Write Through

Page 53: An engineer's introduction to in-memory data grid development

Write Through

Page 54: An engineer's introduction to in-memory data grid development

Write Through

Page 55: An engineer's introduction to in-memory data grid development

Write Through

Page 56: An engineer's introduction to in-memory data grid development

Write Behind

• Updates to the cache will be asynchronously written

to the data store via the cache server responsible for the given key through the provided CacheStore

Page 57: An engineer's introduction to in-memory data grid development

Write Behind Factors

• Advantages

– Application response time and scalability decoupled from data

store, ideal for write heavy applications

– Multiple updates to the same cache entry are coalesced,

reducing the load on the data store

– Application can continue to function upon data store failure

• Disadvantages

– In the event of multiple simultaneous server failures, some

data in the write behind queue may be lost

– Loss of a single physical server will result in no data loss

Page 58: An engineer's introduction to in-memory data grid development

Write Behind

Page 59: An engineer's introduction to in-memory data grid development

Write Behind

Page 60: An engineer's introduction to in-memory data grid development

Write Behind

Page 61: An engineer's introduction to in-memory data grid development

Write Behind

Page 62: An engineer's introduction to in-memory data grid development

Write Behind

Page 63: An engineer's introduction to in-memory data grid development

Write Behind

Page 64: An engineer's introduction to in-memory data grid development

Write Behind

Page 65: An engineer's introduction to in-memory data grid development

Write Behind

Page 66: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

Data Processing Options

Page 67: An engineer's introduction to in-memory data grid development

Data Processing

Parallel Query

• Programmatic query mechanism

• Queries performed in parallel across the grid

• Standard indexes provided out-of-the-box and supports implementing

your own custom indexes

Page 68: An engineer's introduction to in-memory data grid development

// get the “myTrades” cache

NamedCache cacheTrades =

CacheFactory.getCache(“myTrades”);

// create the “query”

Filter filter =

new AndFilter(new EqualsFilter("getTrader", traderid),

new EqualsFilter("getStatus", Status.OPEN));

// perform the parallel query

Set setOpenTrades = cacheTrades.entrySet(filter);

Data Processing

Parallel Query

Page 69: An engineer's introduction to in-memory data grid development

Data Processing

Parallel Query

Page 70: An engineer's introduction to in-memory data grid development

Data Processing

Parallel Query

Page 71: An engineer's introduction to in-memory data grid development

Data Processing

Parallel Query

Page 72: An engineer's introduction to in-memory data grid development

Data Processing

Parallel Query

Page 73: An engineer's introduction to in-memory data grid development

Data Processing

Continuous Query Cache

• Automatically, transparently and dynamically maintains a view locally

based on a specific criteria (i.e. Filter)

• Same API as all other Coherence caches

• Support local listeners.

• Supports layered views

Page 74: An engineer's introduction to in-memory data grid development

Data Processing

Continuous Query Cache

// get the “myTrades” cache

NamedCache cacheTrades =

CacheFactory.getCache(“myTrades”);

// create the “query”

Filter filter =

new AndFilter(new EqualsFilter("getTrader", traderid),

new EqualsFilter("getStatus", Status.OPEN));

// create the continuous query cache

NamedCache cqcOpenTrades = new

ContinuousQueryCache(cacheTrades, filter);

Page 75: An engineer's introduction to in-memory data grid development

Data Processing

Continuous Query Cache

Page 76: An engineer's introduction to in-memory data grid development

Data Processing

Continuous Query Cache

Page 77: An engineer's introduction to in-memory data grid development

Data Processing

Continuous Query Cache

Page 78: An engineer's introduction to in-memory data grid development

Data Processing

Continuous Query Cache

Page 79: An engineer's introduction to in-memory data grid development

Data Processing

Invocable Map

• The inverse of caching

• Sends the processing (e.g. EntryProcessors) to where the data is in the

grid

• Standard EntryProcessors provided Out-of-the-box

• Once and only once guarantees

• Processing is automatically fault-tolerant

• Processing can be:

• Targeted to a specific key

• Targeted to a collection of keys

• Targeted to any object that matches a specific criteria (i.e. Filter)

Page 80: An engineer's introduction to in-memory data grid development

Data Processing

Invocable Map

// get the “myTrades” cache

NamedCache cacheTrades =

CacheFactory.getCache(“myTrades”);

// create the “query”

Filter filter =

new AndFilter(new EqualsFilter("getTrader", traderid),

new EqualsFilter("getStatus", Status.OPEN));

// perform the parallel processing

cacheTrades.invokeAll(filter, new

CloseTradeProcessor());

Page 81: An engineer's introduction to in-memory data grid development

Data Processing

Invocable Map

Page 82: An engineer's introduction to in-memory data grid development

Data Processing

Invocable Map

Page 83: An engineer's introduction to in-memory data grid development

Data Processing

Invocable Map

Page 84: An engineer's introduction to in-memory data grid development

Data Processing

Invocable Map

Page 85: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

Code Examples

Page 86: An engineer's introduction to in-memory data grid development

Cluster cluster = CacheFactory.ensureCluster();

Clustering Java Processes

• Joins an existing cluster

or forms a new cluster– Time “to join” configurable

• cluster contains

information about the

Cluster– Cluster Name

– Members

– Locations

– Processes

• No “master” servers

• No “server registries”

(c) Copyright 2007. Oracle Corporation

Page 87: An engineer's introduction to in-memory data grid development

Leaving a Cluster

• Leaves the current

cluster

• shutdown blocks until

“data” is safe

• Failing to call shutdown

results in Coherence

having to detect process

death/exit and recover

information from another

process.

• Death detection and

recovery is automatic

(c) Copyright 2007. Oracle Corporation

CacheFactory.shutdown();

Page 88: An engineer's introduction to in-memory data grid development

Using a Cache

get, put, size & remove

• CacheFactory

resolves cache names

(ie: “mine”) to configured NamedCaches

• NamedCache provides

data topology agnostic

access to information

• NamedCache interfaces

implement several

interfaces;– java.util.Map, Jcache,

ObservableMap*,

ConcurrentMap*,

QueryMap*,

InvocableMap*

NamedCache nc = CacheFactory.getCache(“mine”);

Object previous = nc.put(“key”, “hello world”);

Object current = nc.get(“key”);

int size = nc.size();

Object value = nc.remove(“key”);

Coherence* Extensions

Page 89: An engineer's introduction to in-memory data grid development

Using a Cache

keySet, entrySet, containsKey

• Using a NamedCache is

like using a java.util.Map

• What is the difference

between a Map and a

Cache data-structure?– Both use (key,value) pairs

for entries

– Map entries don’t expire

– Cache entries may expire

– Maps are typically limited

by heap space

– Caches are typically size

limited (by number of

entries or memory)

– Map content is typically in-

process (on heap)

NamedCache nc = CacheFactory.getCache(“mine”);

Set keys = nc.keySet();

Set entries = nc.entrySet();

boolean exists = nc.containsKey(“key”);

Page 90: An engineer's introduction to in-memory data grid development

Observing Cache Changes

ObservableMap

• Observe changes in

real-time as they occur in a NamedCache

• Options exist to optimize

events by using Filters,

(including pre and post

condition checking) and

reducing on-the-wire

payload (Lite Events)

• Several MapListeners

are provided out-of-the-

box. – Abstract, Multiplexing...

NamedCache nc = CacheFactory.getCache(“stocks”);

nc.addMapListener(new MapListener() {

public void onInsert(MapEvent mapEvent) {

}

public void onUpdate(MapEvent mapEvent) {

}

public void onDelete(MapEvent mapEvent) {

}

});

Page 91: An engineer's introduction to in-memory data grid development

Querying Caches

QueryMap

• Query NamedCache keys

and entries across a cluster

(Data Grid) in parallel*

using Filters

• Results may be ordered

using natural ordering or

custom comparators

• Filters provide support

almost all SQL constructs

• Query using non-relational

data representations and

models

• Create your own Filters

* Requires Enterprise Edition or above

NamedCache nc = CacheFactory.getCache(“people”);

Set keys = nc.keySet(

new LikeFilter(“getLastName”,

“%Stone%”));

Set entries = nc.entrySet(

new EqualsFilter(“getAge”,

35));

Page 92: An engineer's introduction to in-memory data grid development

Continuous Observation

Continuous Query Caches

• ContinuousQueryCache

provides real-time and in-

process copy of filtered

cached data

• Use standard or your own

custom Filters to limit view

• Access to “view”of cached

information is instant

• May use with MapListeners

to support rendering real-

time local views (aka: Think

Client) of Data Grid

information.

NamedCache nc = CacheFactory.getCache(“stocks”);

NamedCache expensiveItems =

new ContinuousQueryCache(nc,

new GreaterThan(“getPrice”, 1000));

Page 93: An engineer's introduction to in-memory data grid development

Aggregating Information

InvocableMap

• Aggregate values in a NamedCache across a

cluster (Data Grid) in

parallel* using Filters

• Aggregation constructs

include; Distinct, Sum,

Min, Max, Average,

Having, Group By

• Aggregate using non-

relational data models

• Create your own

aggregators

* Requires Enterprise Edition or above

NamedCache nc = CacheFactory.getCache(“stocks”);

Double total = (Double)nc.aggregate(

AlwaysFilter.INSTANCE,

new DoubleSum(“getQuantity”));

Set symbols = (Set)nc.aggregate(

new EqualsFilter(“getOwner”, “Larry”),

new DistinctValue(“getSymbol”));

Page 94: An engineer's introduction to in-memory data grid development

Mutating Information

InvocableMap

• Invoke EntryProcessors

on zero or more entries in a NamedCache across a

cluster (Data Grid) in

parallel* (using Filters) to

perform operations

• Execution occurs where the

entries are managed in the

cluster, not in the thread calling invoke

• This permits Data +

Processing Affinity

* Requires Enterprise Edition or above

NamedCache nc = CacheFactory.getCache(“stocks”);

nc.invokeAll(

new EqualsFilter(“getSymbol”, “ORCL”),

new StockSplitProcessor());

...

class StockSplitProcessor extends

AbstractProcessor {

Object process(Entry entry) {

Stock stock = (Stock)entry.getValue();

stock.quantity *= 2;

entry.setValue(stock);

return null;

}

}

Page 95: An engineer's introduction to in-memory data grid development

<Insert Picture Here>

Conclusion

Page 96: An engineer's introduction to in-memory data grid development

Data Grid Uses

Caching

Applications request data from the Data Grid rather than backend data sources

Analytics

Applications ask the Data Grid questions from simple queries to advanced scenario modeling

Transactions

Data Grid acts as a transactional System of Record, hosting data and business logic

Events

Automated processing based on event

Page 97: An engineer's introduction to in-memory data grid development

Oracle Coherence - Conclusion

• Protect the Database Investment

– Ensure DB does what it does best, limit cost of rearchitecture

• Scale as you Grow

– Cost effective: Start small with 3-5 servers, scale to hundreds

of servers as business grows

• Enables business continuity

– Providing continuous data availability

Page 98: An engineer's introduction to in-memory data grid development

Q U E S T I O N S

A N S W E R S


Recommended