Post on 01-Nov-2014
description
transcript
GigaSpaces Data Caching / Data Grid overview
August 2009
Scaling Up Your
Database by
Adding a Data Grid
Scaling Up Your Database by Adding a Data Grid
• To scale up your database, use the IMDG directly
• On the backend, the IMDG persists the data to your database using your
existing Hibernate O/R mapping.
• Hibernate used by the IMDG
• Application using Native IMDG API - object/SQL API very similar to
Hibernate
• Gain full power of the IMDG
• Good for write and read scenarios
Benefits of using GigaSpaces as the system of record
• Decreasing database load through partitioning and
data distribution - enables higher data volumes and
higher throughput with low latency
• Better decoupling between your application and the
database - no need to hard-wire Hibernate and database
concepts into your code and runtime environment
• Event-driven model enables notifications when data is
modified
• Database access can be synchronous or
asynchronous - the GigaSpaces Mirror Service allows
data to be persisted to the database asynchronously,
without a performance penalty
IMDG Access support
• Main Features:
– Direct persistency (Write/Read Through)
– Asynchronous Reliable persistency (Write Behind)
– Fast Data load once IMDG started
– Lazy load in case of a cache miss
– Delegating IMDG SQL Queries to database
– Advanced Hibernate and nHibernate integration
– Java , C++ and .Net objects persistency
– Custom persistency support
Step 2: Access data via IMDG SQL Queries
• Supported Options and Queries– Opeations: =, <>, <,>, >=, <=, [NOT] like, is [NOT] null, IN.– GROUP BY – performs DISTINCT on the POJO properties– Order By (ASC | DESC)
SQLQuery rquery = new SQLQuery(MyPojo.class,"firstName rlike '(a|c).*' or ago > 0 and lastName rlike '(d|k).*'");
Object[] result = space.readMultiple(rquery);
• Dynamic Query SupportSQLQuery query = new SQLQuery(MyClass.class,“firstName = ? or lastName = ? and ago>?");
query.setParameters(“david”,”lee”,50);
• Supported Options via JDBC API– COUNT, MAX, MIN, SUM, AVG , DISTINCT , Blob and Clob , rownum , sysdate , Table aliases – Join with 2 tables
• Non Supported– HAVING, VIEW, TRIGGERS, EXISTS, BETWEEN, NOT, CREATE USER, GRANT, REVOKE, SET PASSWORD, CONNECT USER, ON.
– NOT NULL, IDENTITY, UNIQUE, PRIMARY KEY, Foreign Key/REFERENCES, NO ACTION, CASCADE, SET NULL, SET DEFAULT, CHECK.
– Union, Minus, Union All.
– STDEV, STDEVP, VAR, VARP, FIRST, LAST.
– # LEFT , RIGHT [INNER] or [OUTER] JOIN
GigaSpaces In-
Memory-Data-
Grid
The IMDG – Runtime Modes – Embedded
• An IMDG (space) instance that runs within the application memory address
space
• Accessed by reference without going through network or serialization calls
• Most efficient configuration mode
• Used as the primary space configuration setup
Virtual Machine
ClientApplication
C++
The IMDG – Runtime Modes – Remote
• Accessing a remote space involves network calls and
serialization/de-serialization of the cached objects between the
client and the space process
• Used only in cases where: – Client application cannot run an embedded space (due to memory capacity
limitations, etc.)
– In cases where there are a large number of concurrent updates on the same
cached object using different remote processes
Virtual Machine Virtual MachineVirtual Machine
ClientApplication
ClientApplication
C++C++
The IMDG – Runtime Modes – Master-Local Cache
• A local ‘cache’ – Embedded with a client
– Set of cached objects is a snapshot
– No additional objects get added to the local space unless new queries are made
– Writes should be made on the master only
• Use when– Many distributed clients
– Accessing the same space
– Read-mostly
Master SpaceClient
ClientApplication
The IMDG – Runtime Modes – Master-Local View
• A local 'View'– Embedded with a client
– Contains updated and changing results based on a client specified query
• Used when– Clients want to get a streaming view of a subset of the 'main' space
• Writes can be made to the view– Contains a proxy to the master
Master SpaceClient
ClientApplication
The IMDG – Runtime Modes – Persistent
• Stores data both into memory and on disk in a relational database
Database
Virtual Machine
Can use custom Mapping or built in Hibernate/nHibernate plug-in
Virtual MachineVirtual Machine
Sync Replication
BackupPrimary
Virtual Machine
Initial Load
Mirror
Async-Replication
Bulk
Feeder
Asynchronous Reliable Persistency - Write Behind
• The most common architecture
• Database is out of the critical path of the transaction
• IMDG operations and data are delegated to the database in a reliable, consistent manner
• Support read and write scenarios
Hibernate
Hibernate
GSCGSC
Replication
Partition 1-BackupPartition 1-Primary
Initial loadusing Query 1
GSCGSC
ReplicationInitial loadusing Query 2
GSCGSC
ReplicationInitial loadusing Query 3
Partition 2-BackupPartition 2-Primary
Partition 3-BackupPartition 3-Primary
The Initial Load – Fast Data load from the Database
IMDG
Deployment Topologies
IMDG Basic Deployment Topologies
Primary-Backup
Virtual Machine Virtual MachineVirtual Machine
Virtual MachineVirtual Machine
Replication
BackupPrimary
Partitioned
Feeder
Feeder
Virtual MachineVirtual Machine Virtual MachineVirtual Machine
Replication
Primary 1Backup 1
Replication
Backup 2Primary 2
Partitioned + BackupFeeder
IMDG
Operations
IMDG Basic Operations
Application
Space
Take
Application
Space
Read
Application
Space
WriteMultipleApplication
Space
Write
Application
Space
ReadMultiple
Application
Space
TakeMultiple
Application
Space
ExecuteApplication
Space
Notify
Move into SBA
What is Space Based Architecture (SBA)
What is a Processing Unit:
• Bundle of services, data,
messaging
• Collocation into single VM
• Unified Messaging & Data
• In-Memory
Cloud of Processing
Units
• Scale through
Partitioning
• Virtualized middleware
Space-Based Architecture (SBA) is a software architecture pattern for achieving linear scalability of stateful, high-performance applications, based on Yale’sTuple-Space Model (Source Wikipedia)
Space-Based Architecture (SBA) is a software architecture pattern for achieving linear scalability of stateful, high-performance applications, based on Yale’sTuple-Space Model (Source Wikipedia)
Application
Space
Write POJO
What is a Space:
• Elegant – 4 API
• Solves:
• Data sharing
• Messaging
• Workflow
• Parallel processing
Move into SBA
• Deploy Application components as Processing Units
– Form a composite SOA Application• Distributed Data Processing
– Use GigaSpaces event driven and data processing
components to process incoming data in real time• Collocate business logic and Data
– Scale these as one entity to allow true linear
scalability
Space Based Architecture – Business logic and data collocated
Processing UnitProcessing UnitProcessing Unit
ServiceBean
ServiceBean
ServiceBean Processing Unit
CollectorServiceBean
Processing Unit
FeederServiceBean
Primary 1 Primary 2 Primary 3
Backup 3Backup 2Backup 1
Replication Replication Replication
Pushing data into the backend system
In-Memory-Data-Grid and collocated Processing units
Collects results / reporting Service
Map-Reduce Approach to perform Parallel Query
• How The GigaSpaces Task Executors works?– Phase 1 - Sending the Task to be executed:
Map-Reduce Approach to perform Parallel Query
• How Task Executors works? – Phase 2 - Getting the results back to be reduced.
– The Task itself will query the IMDG instance and perform whatever calculations needed.
Distributed Task Example
public class MyDistTask implements DistributedTask<Integer, Long> {
public Integer execute() throws Exception { return 1; }
public Long reduce(List<AsyncResult<Integer>> results) throws Exception {
long sum = 0;
for (AsyncResult<Integer> result : results) {
if (result.getException() != null) {
throw result.getException();
}
sum += result.getResult();
}
return sum;
}
}
AsyncFuture<Long> future = gigaSpace.execute(new MyDistTask());
long result = future.get(); // result will be the number of primary spaces
The Task Reducer Implementation– Run at
the client Side
The Task execution – Called from the Client
side
The Task execute Implementation – Run at
the Space Side
SBA
Fundamental
Using SBA to Virtualize the Middleware = GigaSpaces XAP
• Steps to virtualize the middleware:
1. Decouple the application from the deployment
environment
2. Use partitioning to split the load and the data
3. Move manual process to SLA driven deployment
4. Inject dynamic scaling and self healing
• The result: a scale-out application server providing:
– End-end scale-out middleware for Web data, messaging and business logic
– In memory clustering
– Unique database scalability
– Automatic self healing
• Enterprise-grade and OEM-ready:
– Supports open-source and standard development
frameworks
– Supports Java, .NET, C++ and scripting
languages
The Service Grid
• Continuous Application Availability To Achieve 99999’s – Automatically provision additional resources after failures
• Maintain optimal application performance– Dynamically scale (or shrink) system resources based upon business demand
• Dramatic reduction in enterprise server utilization rates– Dynamic provisioning eliminates the need to design for peak loads
• Significant reduction in IT Operations and system management costs
An automated, SLA-based application provisioning & management engine
Typical Web Application Architecture
Grid ServiceContainer
Grid ServiceContainer
Grid ServiceContainer
Grid ServiceContainer
Grid ServiceContainer
Processing UnitProcessing UnitProcessing UnitProcessing Unit
ServiceBean
ServiceBean
ServiceBeanProcessing Unit
Admin Application Replication
ServiceBean
Replication
Primary 1 Backup 1 Primary 2 Backup 2
Grid ServiceContainer
Processing Unit
Web Container
Grid ServiceContainer
Processing Unit
Web Container
Grid ServiceContainer
Processing Unit
Web Container
Web Browser
Load Balancer(Apache)
Grid ServiceManager
Dynamic LB Configuration Dynamic LB
Configuration Managed Jetty Web
Containers, Http Session on top
of the Space
Managed Jetty Web Containers,
Http Session on top of the Space
Business Logic and Data on top of the Data Grid
Business Logic and Data on top of the Data Grid
Interact with BL and Data via Space
API, events, remoting or task
executors
Interact with BL and Data via Space
API, events, remoting or task
executors
Partitioning and collocation for best
performance and scalability
Partitioning and collocation for best
performance and scalability
Async. Persistency
Async. Persistency Proactive
AdministrationProactive
Administration