Date post: | 13-May-2015 |
Category: |
Technology |
Upload: | oracleimc-isv-migration-center |
View: | 2,276 times |
Download: | 0 times |
<Insert Picture Here>
An engineer’s introduction to in-memory data grid development
Andrei Cristian Niculae
Technology Presales Consultant
Agenda
• What Is Coherence
– Distributed Data Grid
• How Does It Work?
• Use Cases
• Conclusion
• Q&A
<Insert Picture Here>
Distributed Data Grid
<Insert Picture Here>
“A Distributed Data Grid is a system composed of multiple servers that work together to manage
information and related operations - such as computations - in a distributed environment.”
Cameron Purdy
VP of Development, Oracle
Scalability Chasm
Application Servers
Web
Servers
Data Demand
Ever Expanding Universe of Users
Data Supply
• Data Demand outpacing Data Supply
• Rate of growth outpacing ability to cost effectively scale applications
Performance Problem
A Performance Bottleneck
ApplicationDatabase Tables
Object
JavaSQL server
Relational
• Volume
• Complexity
• Frequency of Data Access
Oracle Coherence as Data Broker
Application Servers
Web
Servers
Data Demand
Ever Expanding Universe of Users
Data Supply
• Oracle Coherence brokers Data Supply with Data Demand
• Scale out Data Grid in middle tier using commodity hardware
Data Sources
Objects
Coherence In-Memory Data GridLinear scaling, extreme performance, unparalleled reliability
• Memory spans multiple
machines
• Add/remove nodes
dynamically
• Scale linearly to
thousands
• Reliability through
redundancy
• Performance through
parallelization
• Integration through
shared memory grid
CoherenceCoherence
<Insert Picture Here>
How does it work
Oracle Coherence: A Unique Approach
• Dynamically scale-out duringoperation
• Data automatically load-balanced to
new servers in the cluster
• No repartitioning required
• No reconfiguration required
• No interruption to service during scale-out
• Scale capacity and processing on-the-fly
Oracle Coherence: How it Works
• Data is automatically partitioned and load-balanced across the Server Cluster
• Data is synchronously replicated for continuous availability
• Servers monitor the health of each other
• When in doubt, servers work together to diagnose status
• Healthy servers assume responsibility for failed server (in parallel)
• Continuous Operation: No interruption to service or data loss due to a server failure
Oracle Coherence: Summary
• Peer-to-Peer Clustering and Data Management Technology
• No Single Points of Failure
• No Single Points of Bottleneck
• No Masters / Slaves / Registries etc
• All members have responsibility to;
• Manage Cluster Health & Data
• Perform Processing and Queries
• Work as a “team” in parallel
• Communication is point-to-point (not TCP/IP) and/or one-to-many
• Scale to limit of the back-plane
• Use with commodity infrastructure
• Linearly Scalable By Design
15
Coherence Topology Example
Distributed Topology
• Data spread and backed up across Members
• Transparent to developer
• Members have access to all Data
• All Data locations are known – no lookup & no registry!
16
Coherence Update Example
Distributed Topology
• Synchronous Update
• Avoids potential Data Loss & Corruption
• Predictable Performance
• Backup Partitions are partitioned away from Primaries for resilience
• No engineering requirement to setup Primaries or Backups
• Automatically and Dynamically Managed
17
Coherence Recovery Example
Distributed Topology
• No in-flight operations lost
• Some latencies (due to higher priority of recovery)
• Reliability, Availability, Scalability, Performance are the priority
• Degrade performance of some requests
CoherenceDistributed data management for applications
• Development Library
– Pure Java 1.4.2+
– Pure .Net 1.1 and 2.0 (client)
– C++ client (3.4)
– No Third-Party Dependencies
– No Open Source Dependencies
– Proprietary Network Stack (Peer-To-Peer model)
• Other Libraries Support…
– Database and File System Integration
– TopLink and Hibernate
– Http Session Management
– WebLogic Portal Caches
– Spring, Groovy
The Portable Object Format
Advanced Serialization
• Simple Serialization Comparison
– In XML
• <date format=“java.util.Date”>2008-07-03</date>
• 47 characters (possibly 94 bytes depending on encoding)
– In Java (as a raw long)
• 64 bits = 8 bytes
– In Java (java.util.Date using ObjectOutputStream)
• 46 bytes
– In ExternalizableLite (as a raw long)
• 8 bytes
– In POF (Portable Object Format)
• 4F 58 1F 70 6C = 5 bytes
(c) Copyright 2008. Oracle Corporation
©2011 Oracle Corporation 23
Coherence Cache Types / Strategies
Replicated
Cache
Optimistic
Cache
Partitioned
Cache
Near Cache
backed by
partitioned
cache
LocalCache not
clustered
Topology Replicated Replicated Partitioned Cache Local Caches +
Partitioned Cache
Local Cache
Read Performance Instant Instant Locally cached:
instant --Remote:
network speed
Locally cached:
instant --Remote:
network speed
Instant
Fault Tolerance Extremely High Extremely High Configurable Zero to
Extremely High
Configurable 4 Zero
to Extremely High
Zero
Write Performance Fast Fast Extremely fast Extremely fast Instant
Memory Usage
(Per JVM)
DataSize DataSize DataSize/JVMs x
Redundancy
LocalCache +
[DataSize / JVMs]
DataSize
Coherency fully coherent fully coherent fully coherent fully coherent n/a
Memory Usage
(Total)
JVMs x DataSize JVMs x DataSize Redundancy x
DataSize
[Redundancy x
DataSize] + [JVMs x
LocalCache]
n/a
Locking fully transactional none fully transactional fully transactional fully transactional
Typical Uses Metadata n/a (see Near
Cache)
Read-write caches Read-heavy caches
w/ access affinity
Local data
<Insert Picture Here>
Data Management Options
Data Management
Partitioned Caching
• Extreme Scalability: Automatically, dynamically and transparently partitions the data set across the members of the grid.
• Pros:– Linear scalability of data capacity
– Processing power scales with data capacity.
– Fixed cost per data access
• Cons:– Cost Per Access: High percentage
chance that each data access will go across the wire.
• Primary Use:• Large in-memory storage
environments
• Parallel processing environments
Data Management
Partitioned Fault Tolerance
• Automatically, dynamically and transparently manages the fault tolerance of your data.
• Backups are guaranteed to be on a separate physical machine as the primary.
• Backup responsibilities for one node’s data is shared amongst the other nodes in the grid.
Data Management
Cache Client/Cache Server
• Partitioning can be controlled on a member by member basis.
• A member is either responsible for an equal partition of the data or not (“storage enabled” vs. “storage disabled”)
• Cache Client – typically the application instances
• Cache Servers –typically stand-alone JVMs responsible for storage and data processing only.
Data Management
Near Caching
• Extreme Scalability and Extreme Performance: The best of both worlds between the Replicated and Partitioned topologies. Most recently/frequently used data is stored locally.
• Pros:– All of the same Pros as the
Partitioned topology plus…– High percentage chance data is
local to request.
• Cons:– Cost Per Update: There is a cost
associated with each update to a piece of data that is stored locally on other nodes.
• Primary Use:– Large in-memory storage
environments with likelihood of repetitive data access.
<Insert Picture Here>
Caching Patterns
and Terminology
Cache Aside
• Cache Aside pattern means the developer manages the
contents of the cache manually
– Check the cache before reading from data source
– Put data into cache after reading from data source
– Evict or update cache when updating data source
• Cache Aside can be written as a core part of the
application logic, or it can be a cross cutting concern (using AOP) if the data
access pattern is generic
Read Through / Write Through
• Read Through
– All data reads occur through cache
– If there is a cache miss, the cache will load the data from the
data source automatically
• Write Through
– All data writes occur through cache
– Updates to the cache are written synchronously to the data
source
Write Behind
• All data writes occur through cache
– Just like Write Through
• Updates to the cache are written asynchronously to
the data source
<Insert Picture Here>
How it Works
CacheStore
•CacheLoader and CacheStore (normally referred
to generically as a CacheStore) are generic
interfaces
• There are no restrictions to the type of data store that
can be used
– Databases
– Web services
– Mainframes
– File systems
– Remote clusters
– …even another Coherence cluster!
Read Through
• Every key in a Coherence cache is assigned (hashed)
to a specific cache server
• If a get results in a cache miss, the cache entry will be
loaded from the data store via the cache server through the provided CacheLoader implementation
• Multiple simultaneous requests for the same key will
result in a single load request to the data store
– This is a key advantage over Cache Aside
Read Through
Read Through
Read Through
Read Through
Read Through
Read Through
Read Through
Read Through
Refresh Ahead
• Coherence can use a CacheLoader to fetch data
asynchronously from the data store before expiry
• This means that no client threads are blocked while
the data is refreshed from the data store
• This mechanism is triggered by a cache get - items
that are not accessed will be allowed to expire, thus
requiring a read through for the next access
Refresh Ahead
Refresh Ahead
Refresh Ahead
Refresh Ahead
Refresh Ahead
Refresh Ahead
Refresh Ahead
Refresh Ahead
Write Through
• Updates to the cache will be synchronously written to
the data store via the cache server responsible for the given key through the provided CacheStore
• Advantages
– Write to cache can be rolled back upon data store error
• Disadvantages
– Write performance can only scale as far as the database will
allow
Write Through
Write Through
Write Through
Write Through
Write Through
Write Through
Write Through
Write Behind
• Updates to the cache will be asynchronously written
to the data store via the cache server responsible for the given key through the provided CacheStore
Write Behind Factors
• Advantages
– Application response time and scalability decoupled from data
store, ideal for write heavy applications
– Multiple updates to the same cache entry are coalesced,
reducing the load on the data store
– Application can continue to function upon data store failure
• Disadvantages
– In the event of multiple simultaneous server failures, some
data in the write behind queue may be lost
– Loss of a single physical server will result in no data loss
Write Behind
Write Behind
Write Behind
Write Behind
Write Behind
Write Behind
Write Behind
Write Behind
<Insert Picture Here>
Data Processing Options
Data Processing
Parallel Query
• Programmatic query mechanism
• Queries performed in parallel across the grid
• Standard indexes provided out-of-the-box and supports implementing
your own custom indexes
// get the “myTrades” cache
NamedCache cacheTrades =
CacheFactory.getCache(“myTrades”);
// create the “query”
Filter filter =
new AndFilter(new EqualsFilter("getTrader", traderid),
new EqualsFilter("getStatus", Status.OPEN));
// perform the parallel query
Set setOpenTrades = cacheTrades.entrySet(filter);
Data Processing
Parallel Query
Data Processing
Parallel Query
Data Processing
Parallel Query
Data Processing
Parallel Query
Data Processing
Parallel Query
Data Processing
Continuous Query Cache
• Automatically, transparently and dynamically maintains a view locally
based on a specific criteria (i.e. Filter)
• Same API as all other Coherence caches
• Support local listeners.
• Supports layered views
Data Processing
Continuous Query Cache
// get the “myTrades” cache
NamedCache cacheTrades =
CacheFactory.getCache(“myTrades”);
// create the “query”
Filter filter =
new AndFilter(new EqualsFilter("getTrader", traderid),
new EqualsFilter("getStatus", Status.OPEN));
// create the continuous query cache
NamedCache cqcOpenTrades = new
ContinuousQueryCache(cacheTrades, filter);
Data Processing
Continuous Query Cache
Data Processing
Continuous Query Cache
Data Processing
Continuous Query Cache
Data Processing
Continuous Query Cache
Data Processing
Invocable Map
• The inverse of caching
• Sends the processing (e.g. EntryProcessors) to where the data is in the
grid
• Standard EntryProcessors provided Out-of-the-box
• Once and only once guarantees
• Processing is automatically fault-tolerant
• Processing can be:
• Targeted to a specific key
• Targeted to a collection of keys
• Targeted to any object that matches a specific criteria (i.e. Filter)
Data Processing
Invocable Map
// get the “myTrades” cache
NamedCache cacheTrades =
CacheFactory.getCache(“myTrades”);
// create the “query”
Filter filter =
new AndFilter(new EqualsFilter("getTrader", traderid),
new EqualsFilter("getStatus", Status.OPEN));
// perform the parallel processing
cacheTrades.invokeAll(filter, new
CloseTradeProcessor());
Data Processing
Invocable Map
Data Processing
Invocable Map
Data Processing
Invocable Map
Data Processing
Invocable Map
<Insert Picture Here>
Code Examples
Cluster cluster = CacheFactory.ensureCluster();
Clustering Java Processes
• Joins an existing cluster
or forms a new cluster– Time “to join” configurable
• cluster contains
information about the
Cluster– Cluster Name
– Members
– Locations
– Processes
• No “master” servers
• No “server registries”
(c) Copyright 2007. Oracle Corporation
Leaving a Cluster
• Leaves the current
cluster
• shutdown blocks until
“data” is safe
• Failing to call shutdown
results in Coherence
having to detect process
death/exit and recover
information from another
process.
• Death detection and
recovery is automatic
(c) Copyright 2007. Oracle Corporation
CacheFactory.shutdown();
Using a Cache
get, put, size & remove
• CacheFactory
resolves cache names
(ie: “mine”) to configured NamedCaches
• NamedCache provides
data topology agnostic
access to information
• NamedCache interfaces
implement several
interfaces;– java.util.Map, Jcache,
ObservableMap*,
ConcurrentMap*,
QueryMap*,
InvocableMap*
NamedCache nc = CacheFactory.getCache(“mine”);
Object previous = nc.put(“key”, “hello world”);
Object current = nc.get(“key”);
int size = nc.size();
Object value = nc.remove(“key”);
Coherence* Extensions
Using a Cache
keySet, entrySet, containsKey
• Using a NamedCache is
like using a java.util.Map
• What is the difference
between a Map and a
Cache data-structure?– Both use (key,value) pairs
for entries
– Map entries don’t expire
– Cache entries may expire
– Maps are typically limited
by heap space
– Caches are typically size
limited (by number of
entries or memory)
– Map content is typically in-
process (on heap)
NamedCache nc = CacheFactory.getCache(“mine”);
Set keys = nc.keySet();
Set entries = nc.entrySet();
boolean exists = nc.containsKey(“key”);
Observing Cache Changes
ObservableMap
• Observe changes in
real-time as they occur in a NamedCache
• Options exist to optimize
events by using Filters,
(including pre and post
condition checking) and
reducing on-the-wire
payload (Lite Events)
• Several MapListeners
are provided out-of-the-
box. – Abstract, Multiplexing...
NamedCache nc = CacheFactory.getCache(“stocks”);
nc.addMapListener(new MapListener() {
public void onInsert(MapEvent mapEvent) {
}
public void onUpdate(MapEvent mapEvent) {
}
public void onDelete(MapEvent mapEvent) {
}
});
Querying Caches
QueryMap
• Query NamedCache keys
and entries across a cluster
(Data Grid) in parallel*
using Filters
• Results may be ordered
using natural ordering or
custom comparators
• Filters provide support
almost all SQL constructs
• Query using non-relational
data representations and
models
• Create your own Filters
* Requires Enterprise Edition or above
NamedCache nc = CacheFactory.getCache(“people”);
Set keys = nc.keySet(
new LikeFilter(“getLastName”,
“%Stone%”));
Set entries = nc.entrySet(
new EqualsFilter(“getAge”,
35));
Continuous Observation
Continuous Query Caches
• ContinuousQueryCache
provides real-time and in-
process copy of filtered
cached data
• Use standard or your own
custom Filters to limit view
• Access to “view”of cached
information is instant
• May use with MapListeners
to support rendering real-
time local views (aka: Think
Client) of Data Grid
information.
NamedCache nc = CacheFactory.getCache(“stocks”);
NamedCache expensiveItems =
new ContinuousQueryCache(nc,
new GreaterThan(“getPrice”, 1000));
Aggregating Information
InvocableMap
• Aggregate values in a NamedCache across a
cluster (Data Grid) in
parallel* using Filters
• Aggregation constructs
include; Distinct, Sum,
Min, Max, Average,
Having, Group By
• Aggregate using non-
relational data models
• Create your own
aggregators
* Requires Enterprise Edition or above
NamedCache nc = CacheFactory.getCache(“stocks”);
Double total = (Double)nc.aggregate(
AlwaysFilter.INSTANCE,
new DoubleSum(“getQuantity”));
Set symbols = (Set)nc.aggregate(
new EqualsFilter(“getOwner”, “Larry”),
new DistinctValue(“getSymbol”));
Mutating Information
InvocableMap
• Invoke EntryProcessors
on zero or more entries in a NamedCache across a
cluster (Data Grid) in
parallel* (using Filters) to
perform operations
• Execution occurs where the
entries are managed in the
cluster, not in the thread calling invoke
• This permits Data +
Processing Affinity
* Requires Enterprise Edition or above
NamedCache nc = CacheFactory.getCache(“stocks”);
nc.invokeAll(
new EqualsFilter(“getSymbol”, “ORCL”),
new StockSplitProcessor());
...
class StockSplitProcessor extends
AbstractProcessor {
Object process(Entry entry) {
Stock stock = (Stock)entry.getValue();
stock.quantity *= 2;
entry.setValue(stock);
return null;
}
}
<Insert Picture Here>
Conclusion
Data Grid Uses
Caching
Applications request data from the Data Grid rather than backend data sources
Analytics
Applications ask the Data Grid questions from simple queries to advanced scenario modeling
Transactions
Data Grid acts as a transactional System of Record, hosting data and business logic
Events
Automated processing based on event
Oracle Coherence - Conclusion
• Protect the Database Investment
– Ensure DB does what it does best, limit cost of rearchitecture
• Scale as you Grow
– Cost effective: Start small with 3-5 servers, scale to hundreds
of servers as business grows
• Enables business continuity
– Providing continuous data availability
Q U E S T I O N S
A N S W E R S