Date post: | 06-May-2015 |
Category: |
Technology |
Upload: | spring-io |
View: | 935 times |
Download: | 3 times |
© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.
In-memory data and compute on top of Hadoop
Jags Ramnarayan – Chief Architect, Fast Data, Pivotal Anthony Baker – Architect, Fast Data, Pivotal
Agenda • In-memory data grid – concepts, strengths, weaknesses • HDFS – strengths, weaknesses • What is our proposal? • How do you use this? SQL syntax and demo • HDFS integration architecture and demo • MapReduce integration and demo
– In-memory, parallel stored procedures • Comparison to Hbase
“It is raining databases in the cloud” (The 451Group) • Next Gen transactional DB is memory
based, distributed, elastic, HA, cloud ready … – In-Memory data grids(IMDG),
NoSQL, Caching • Pivotal GemFire, Oracle coherence,
Redis, Cassandra, …
• Next Gen OLAP DB is centered around Hadoop – Driver: They say it is ‘Volume,
velocity, variety’ – Or, is it just cost/TB?
Agenda • In-memory data grid – concepts, strengths, weaknesses • HDFS – strengths, weaknesses • What is our proposal? • How do you use this? SQL syntax and demo • HDFS integration architecture and demo • MapReduce integration and demo
– In-memory, parallel stored procedures • Comparison to Hbase
IMDG basic concepts
5
– Distributed memory oriented store • KV/Objects or SQL • Queriable, Indexable and transactional
– Multiple storage models • Replication, partitioning in memory • With synchronous copies in cluster • Overflow to disk and/or RDBMS
Handle thousands of concurrent connections
Synchronous replication for slow changing data
Replicated Region
Partition for large data or highly transactional data
Partitioned Region
Redundant copy
– Parallelize Java App logic – Multiple failure detection schemes – Dynamic membership (elastic)
– Vendors differentiate on • SQL support, WAN, events, etc
Low latency for thousands of
clients
6
Key IMDG pattern - Distributed Caching • Designed to work with existing RDBs
– Read through: Fetch from DB on cache miss – Write through: Reflect in cache IFF DB write succeeds – Write behind: reliable, in-order queue and batch write to DB
Traditional RDB integration can be challenging Memory Tables
(1)
DB WRITER
(2)
(3)
(4)
Memory Tables(1)
DB WRITER
(2)
(3)
(4)
Synchronous “Write through”
Single point of bottleneck and failure Not an option for “Write heavy”
Complex 2-phase commit protocol Parallel recovery is difficult
(1)Queue
(2)Updates
Asynchronous, Batches
DB Synchronizer
(1)Queue
(2)
DB Synchronizer
Updates
Asynchronous “Write behind”
Cannot sustain high “write” rates Queue may have to be persistent
Parallel recovery is difficult
Some IMDG, NoSQL offer ‘Shared nothing persistence’
• Append only operation logs
• Fully parallel • Zero disk seeks
• But, cluster restart requires log scan
• Very large volumes pose challenges
MemoryTables
Append only Operation logs
OS Buffers
LOG Compressor
Record1
Record2
Record3
Record1
Record2
Record3
MemoryTables
Append only Operation logs
OS Buffers
LOG Compressor
Record1
Record2
Record3
Record1
Record2
Record3
Agenda • In-memory data grid – concepts, strengths, weaknesses • HDFS – strengths, weaknesses • What is our proposal? • How do you use this? SQL syntax and demo • HDFS integration architecture and demo • MapReduce integration and demo
– In-memory, parallel stored procedures • Comparison to Hbase
Hadoop core(HDFS) for scalable, parallel storage
• maturing and will be ubiquitous • Handle very large data sets on commodity • Handle failures well • Simple Coherency model
Hadoop design center – batch and sequential
� 64MB immutable blocks
� For random reads, you have to sequentially walk through records each time
� Write once, read many design
� Namenode can be a contention point
� Slow failure detection
Hadoop Strengths
� Massive volumes ( TB to PB)
� HA, compression
� Ever growing and maturing eco-system for parallel compute and analytics
� Storage systems like Isilon now offer HDFS interface
� Optimized for virtual machines
Agenda • In-memory data grid – concepts, strengths, weaknesses • HDFS – strengths, weaknesses • What is our proposal? • How do you use this? SQL syntax and demo • HDFS integration architecture and demo • MapReduce integration and demo
– In-memory, parallel stored procedures • Comparison to Hbase
SQL + IMDG(Objects) + HDFS
Data in many shapes – support multiple data models
Main-memory based, distributed low latency, data store for big data
Operational data is the focus. It is in memory (mostly)
All Data, History in HDFS
SQL + IMDG(Objects) + HDFS
Replication or partitioning
Storage model:
In-memory, In-memory with local disk or In-memory with HDFS persistence
SQL + IMDG(Objects) + HDFS SQL Engine – designed for online/OLTP, Transactions
IMDG caching features – readThru, writeBehind, etc
SQL + IMDG(Objects) + HDFS
Tight HDFS integration – streaming, RW cases
analytics without access via in-memory tier – sequential walk through or incremental processing.
With parallel ingestion, you get near real time visibility to data for deep analytics.
SQL + IMDG(Objects) + HDFS
MR ‘reduce’ can directly emit results to in-memory tier
closed loop between real-time and the analytics
GemFire XD – a Pivotal HD Service
SQLFire
Pivotal HD GemFire
Clustering, in-memory storage, HA, replication,
WAN, Events, Distributed queue…
SQL engine – cost based optimizer, in-memory indexing,
DTxn, RDB integration..
Integrated Install, config; command center –
monitoring, optimizations to Hadoop
+
Working set in memory, geo replicated History, time series in HDFS
SQL
Objects, JSON
The real time Latency spectrum
Machine latency
Interactive reports
Batch processing
Human interactions
Milliseconds Seconds Seconds, Minutes Minutes, Hours
GemFire XD, Online/OLTP/Operational DBs Analytics, Data Warehousing PivotalHD HAWQ
Real time on top of Hadoop – who else?
Many more…. Most focused on interactive queries for analytics
Design patterns • Streaming ingest – consume unbounded event streams
– Write fast into memory; stream all writes to HDFS for batch analytics
• e.g. Maintain latest price for each security in memory; time series in HDFS
• Continuously ingest click streams, audit trail or interaction data
– Trap interactions or OLTP transactions, do in-line stream processing (actionable insights) and write results or raw state into HDFS
Design patterns • High performance Operational Database
– Keep operational data in-memory; history in HDFS is randomly accessible
• e.g. Last 1 month of trades in-memory but all history is accessible at some cost
– Take analytic output from Hadoop/SQL analytics and make it visible to online apps
Agenda • In-memory data grid – concepts, strengths, weaknesses • HDFS – strengths, weaknesses • What is our proposal? • How do you use this? SQL syntax and demo • HDFS integration architecture and demo • MapReduce integration and demo
– In-memory, parallel stored procedures • Comparison to Hbase
Agenda • How do you use this? SQL syntax and demo
In-Memory Partitioning & Replication
Explore features using simple STAR schema
27
FLIGHTS---------------------------------------------
FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME,…..
PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER)
FLIGHTAVAILABILITY---------------------------------------------
FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER ,…..
PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER, FLIGHT_DATE))
FOREIGN KEY (FLIGHT_ID, SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER)
FLIGHTHISTORY---------------------------------------------
FLIGHT_ID CHAR(6), SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3),…..
1 – M
1 – 1
SEVERAL CODE/DIMENSION TABLES---------------------------------------------
AIRLINES: AIRLINE INFORMATION (VERY STATIC)COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTSCITIES: MAPS: PHOTOS OF REGIONS SERVED
Assume, thousands of flight rows, millions of flightavailability records
Creating tables
Table
CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. );
GF XD GF XD GF XD
Replicated tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE;
Replicated Table Replicated Table Replicated Table
GF XD GF XD GF XD
Design Pattern Replicate reference tables in
STAR schemas (seldom change, often referenced in queries)
Partitioned tables CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN(FLIGHT_ID);
Table Partitioned Table Partitioned Table Partitioned Table
Replicated Table Replicated Table Replicated Table
GF XD GF XD GF XD
Design Pattern Partition Fact tables in STAR schemas for load balancing
(large, write heavy)
Partitioned but highly available CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1;
Table Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Replicated Table Replicated Table Replicated Table
GF XD GF XD GF XD
Design Pattern Increase redundant copies for HA and load balancing queries
across replicas
Colocation for related data
Table Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Replicated Table Replicated Table Replicated Table
GF XD GF XD GF XD
Colocated Partition Colocated Partition Colocated Partition
Redundant Partition Redundant Partition Redundant Partition
CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , …..
PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS);
Design Pattern Colocate related tables for maximum join performance
Native Disk resident tables (operation logging)
Table Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Replicated Table Replicated Table Replicated Table
GF XD GF XD GF XD
Colocated Partition Colocated Partition Colocated Partition
Redundant Partition Redundant Partition Redundant Partition
sqlf backup /export/fileServerDirectory/sqlfireBackupLocation
Data dictionary is always persisted in each server
CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , …..
PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT;
Demo environment
SQL client
Virtual Machine
GemFire XD Server
GemFire XD Locator
GemFire XD Server
GemFire XD Server
jdbc:sqlfire://localhost:1527
Pulse (monitoring)
Demo: replicated and partitioned tables
Agenda • HDFS integration architecture and demo
Effortless HDFS integration
� Options – Fast Streaming writes – Random RW – With or without time
series
Streaming all writes to HDFS
Table Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Replicated Table Replicated Table Replicated Table
Colocated Partition Colocated Partition Colocated Partition
CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT HDFSSTORE streamingstore WRITEONLY;
CREATE HDFSSTORE streamingstore NAMENODE hdfs://PHD1:8020 DIR /stream-tables BATCHSIZE 10 BATCHTIMEINTERVAL 2000 QUEUEPERSISTENT true;
Read and Write to HDFS
Table Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Partitioned Table
Redundant Partition
Replicated Table Replicated Table Replicated Table
Colocated Partition Colocated Partition Colocated Partition
CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT HDFSSTORE RWStore;
CREATE HDFSSTORE RWStore NAMENODE hdfs://PHD1:8020 DIR /indexed-tables BATCHSIZE 10 BATCHTIMEINTERVAL 2000 QUEUEPERSISTENT true;
Write Path – streaming to HDFS
SQL client 5
HDFS
NameNode GemFire XD
DFS Client
Table FLIGHTS (bucket N)
local store (append only)
3 4
GemFire XD
Table FLIGHTS (bucket N backup)
local store (append only)
DataNode 6
7
1
2
Directory: /GFXD/APP/FLIGHTS/BucketN
In-memory partitioned data colocated with HD DN
Directory structure in HDFS
/1
/0
bloom index data 0-1-XXX.hop
bloom index data 0-2-XXX.hop
bloom index data 1-1-XXX.hop
bloom index data 1-2-XXX.hop
/GFXD
/APP.FLIGHTS
/1
/0
data 0-1-XXX.shop
data 0-2-XXX.shop
data 1-1-XXX.shop
data 1-2-XXX.shop
/GFXD
/APP.FLIGHT_HISTORY
Table FLIGHT_HISTORY
(bucket 0)
Table FLIGHT_HISTORY
(bucket 1)
Read/Write Write-only
Time-stamped records allow incremental Map/Reduce jobs
Read/Write with Compaction
SQL client 5
HDFS
NameNode GemFire XD
DFS Client
Table FLIGHTS (bucket N)
local store (append only)
3 4
GemFire XD
Table FLIGHTS (bucket N backup)
local store (append only)
DataNode 6
7
1
2
Directory: /GFXD/APP/FLIGHTS/BucketN
Now with sorting!
…and compaction
Log structured merge tree (like HBase, Cassandra)
Read path for HDFS tables
SQL client
4 NameNode GemFire XD
DFS Client
Table FLIGHTS (bucket N)
local store (append only)
2 3
DataNode 5
6
1
bloom index data
bloom index data
…
Block cache
Short circuit read path for local blocks; Block cache avoids I/O for bloom and index lookups
Tiered compaction • Async writes allow lock-free sequential I/O
…but more files means slower reads • Compactions balance read/write throughput
• Minor compactions merge small files into bigger files • Major compactions merge all files into one single file
Level 0 bloom index data bloom index data bloom index data
bloom index data
bloom index data
bloom index data … Level 1
Level 2
Time order
…
HDFS
“Closed-loop” with analytics
Level 0 bloom index data bloom index data bloom index data
bloom index data bloom index data … Level 1
Time order
GemFire XD
DFS Client
Table FLIGHTS (bucket N)
local store (append only)
Map/Reduce Pivotal Hawq
Hive
OutputFormat
InputFormat
Demo environment with PivotalHD
Pulse (monitoring)
SQL client
jdbc:sqlfire://localhost:1527 Virtual Machine
GemFire XD Server
GemFire XD Locator
GemFire XD Server
GemFire XD Server
PivotalHD NameNode
PivotalHD DataNode
Demo: HDFS tables
Operational vs. Historical Data • Operational data is retained in memory for fast access • User-supplied criteria identifies operational data
– Enforced on incoming updates or periodically
• Query hints or connection properties control use of historical data
CREATE TABLE flights_history (…) PARTITION BY PRIMARY KEY EVICTION BY CRITERIA (LAST_MODIFIED_DURATION > 300000) EVICTION FREQUENCY 60 SECONDS HDFSSTORE (bar);
SELECT * FROM flights_history --PROPERTIES queryHDFS = true WHERE orig_airport = ‘PDX’ AND miles > 1000 ORDER BY dest_airport
Agenda • MapReduce integration and demo
Hadoop Map/Reduce • Map/Reduce is a framework for
processing massive data sets in parallel
– Mapper acts on local file splits to transform individual data elements
– Reducer receives all values for a key and generates aggregate result
– Driver provides job configuration – InputFormat and OutputFormat
define data source and sink
• Hadoop manages job execution
Node 1
Mapper
Node 2
Mapper
Node 3
Mapper
Node 1
Reducer
Node 2
Reducer
Node 3
Reducer
Shuffle
InputFormat supplies local data Mapper transforms data Hadoop sorts keys Reducer generates aggregate result OutputFormat writes result
Map/Reduce with GemFire XD • Users can execute Hadoop Map/Reduce jobs against GemFire XD data using
– EventInputFormat to read data from HDFS without impacting online availability or performance
– SqlfOutputFormat to write data into SQL table for immediate use by online applications
Hadoop
Reducer SqlfOutputFormat
GemFire XD
table foo jdbc:sqlfire://localhost:1527
PUT INTO foo (…) VALUES (?, ?, …) PUT INTO foo (…) VALUES (?, ?, …) …
Hadoop
Mapper EventInputFormat
DDL file split
Demo: Map/Reduce
Using the InputFormat - Mapper // count each airport present in a FLIGHT_HISTORY row public class SampleMapper extends MapReduceBase implements Mapper<Object, Row, Text, IntWritable> { public void map(Object key, Row row, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { try { IntWritable one = new IntWritable(1); ResultSet rs = row.getRowAsResultSet(); String origAirport = rs.getString("ORIG_AIRPORT"); String destAirport = rs.getString("DEST_AIRPORT"); output.collect(new Text(origAirport), one); output.collect(new Text(destAirport), one); } catch (SQLException e) { … } } }
JobConf conf = new JobConf(getConf()); conf.setJobName("Busy Airport Count"); conf.set(EventInputFormat.HOME_DIR, hdfsHomeDir); conf.set(EventInputFormat.INPUT_TABLE, tableName); conf.setInputFormat(EventInputFormat.class); conf.setMapperClass(SampleMapper.class); ...
Use Spring Hadoop for Job Configuration <beans:beans …> <job id="busyAirportsJob" libs="…" input-‐format="com.vmware.sqlfire.internal.engine.hadoop.mapreduce.EventInputFormat" output-‐path="${flights.intermediate.path}" mapper="demo.sqlf.mr2.BusyAirports.SampleMapper" combiner="org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer" reducer="org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer" /> <job id="topBusyAirportJob" libs="${LIB_DIR}/sqlfire-‐mapreduce-‐1.0-‐SNAPSHOT.jar" input-‐path="${flights.intermediate.path}" output-‐path="${flights.output.path}" mapper="demo.sqlf.mr2.TopBusyAirport.TopBusyAirportMapper" reducer="demo.sqlf.mr2.TopBusyAirport.TopBusyAirportReducer" number-‐reducers="1” /> … </beans:beans>
Using the OutputFormat - Reducer // find the max, aka the busiest airport public class TopBusyAirportReducer extends MapReduceBase implements Reducer<Text, StringIntPair, Key, BusyAirportModel> { public void reduce(Text token, Iterator<StringIntPair> values, OutputCollector<Key, BusyAirportModel> output, Reporter reporter) throws IOException { String topAirport = null; int max = 0; while (values.hasNext()) { StringIntPair v = values.next(); if (v.getSecond() > max) { max = v.getSecond(); topAirport = v.getFirst(); } } BusyAirportModel busy = new BusyAirportModel(topAirport, max); output.collect(null, busy); } }
JobConf conf = new JobConf(getConf()); conf.setJobName("Top Busy Airport"); conf.set(SqlfOutputFormat.OUTPUT_URL, "jdbc:sqlfire://localhost:1527"); conf.set(SqlfOutputFormat.OUTPUT_SCHEMA, "APP"); conf.set(SqlfOutputFormat.OUTPUT_TABLE, "BUSY_AIRPORT"); conf.setReducerClass(TopBusyAirportReducer.class); conf.setOutputKeyClass(Key.class); conf.setOutputValueClass(BusyAirportModel.class); conf.setOutputFormat(SqlfOutputFormat.class); ...
Where do the results go? • Automatically insert reduced values into output table by
matching column names public class BusyAirportModel { private String airport; private int flights; public BusyAirportModel(String airport, int flights) { this.airport = airport; this.flights = flights; } public void setFlights(int idx, PreparedStatement ps) throws SQLException { ps.setInt(idx, flights); } public void setAirport(int idx, PreparedStatement ps) throws SQLException { ps.setString(idx, airport); } }
PUT INTO BUSY_AIRPORT ( flights, airport) VALUES (?, ?) PUT INTO BUSY_AIRPORT (flights, airport) VALUES (?, ?) …
Agenda • In-memory data grid – concepts, strengths, weaknesses • HDFS – strengths, weaknesses • What is our proposal? • How do you use this? SQL syntax and demo • HDFS integration architecture and demo • MapReduce integration and demo
– In-memory, parallel stored procedures • Comparison to Hbase
Scaling Application logic with Parallel “Data Aware
procedures”
Why not Map Reduce?
Source: UC Berkeley Spark project (just the image)
Traditional Map reduce parallel “data aware” procedures
Procedures – managed in spring containers as beans
Java Stored Procedures may be created according to the SQL Standard
SQLFire also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object.
CREATE PROCEDURE getOverBookedFlights () LANGUAGE JAVA PARAMETER STYLE JAVA
READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME
‘examples.OverBookedStatus.getOverBookedStatus’;
Data Aware Procedures Parallelize procedure and prune to nodes with required data
CALL [PROCEDURE]
procedure_name
( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ]
[ { ON TABLE table_name [ WHERE whereClause ] } |
{ ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}
]
Extend the procedure call with the following syntax:
Fabric Server 2 Fabric Server 1
Client
Hint the data the procedure depends on
CALL getOverBookedFlights( ) ON TABLE FLIGHTAVAILABILITY
WHERE FLIGHT_ID = ‘AA1116’;
If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with “AA1116” in this case)
Parallelize procedure then aggregate (reduce) CALL [PROCEDURE]
procedure_name
( [ expression [, expression ]* ] )
[ WITH RESULT PROCESSOR processor_name ]
[ { ON TABLE table_name [ WHERE whereClause ] } |
{ ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}
]
Fabric Server 2 Fabric Server 1
Client
Fabric Server 3
register a Java Result Processor (optional in some cases):
High density storage in Memory – Off Java Heap
Off-heap to minimize JVM copying, GC (MemScale) • Off-heap memory manager for Java
– JVM memory manager not designed for volume – Believe TB memory machines are now commodity class
• Key principles – Avoid defrag and compaction of data blocks through reusable buffer pools – Avoid all the nasty copying in Java heaps
• (YG – From – To – OldGen – UserToKernal copy – network copy) then repeat on the replicated node side
• Hadoop exacerbates the copying problem – Multiple JVMs involved: TaskTracker(JVM) – Data Node (JVM) –
FileSystem/Network – Let alone all the copies and intermediate disk storage required in MR
shuffling
Integration with SpringXD (future)
• Spring XD is a distributed, extensible framework for • Ingestion, real time analytics, batch processing
• GemFire XD as a source and sink
• Pluggability in its Runtime (DIRT) • GemFire XD could be an optional runtime
Comparison to HBase
Reminder for speaker - don’t make this a product pitch J
Some HBase 0.9x challenges
• HBase inherently is not HA; HDFS is – Failed segment servers can cause pauses?
• WAL writes have to synchronously go to HDFS (and its replicas) – HDFS inherently detects failures slowly (thinks it is a overload)
• Probability for Hotspots – Segments are sorted not stored on a random hash
• WAN replication needs a lot of work • No Backup, recovery
Some HBase 0.9x challenges
• No real Querying – just key based range scans – And, LSM on disk is suboptimal to B+Tree for querying
• You cannot execute transactions or integrate with RDBs
• Some like ColumnFamily data model; Really? – Pros: Self describing, nested model is possible – Cons: difficult, query engine optimization is difficult; mapping is your
problem, bloat
Learn More. Stay Connected.
Learn more:
Jags – jramnarayan at gopivotal.com Anthony – abaker at gopivotal.com
http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire Twitter: twitter.com/springsource YouTube: youtube.com/user/SpringSourceDev Google +: plus.google.com/+springframework
Extras
Consistency model
• Replication within cluster is always eager and synchronous • Row updates are always atomic; No need to use transactions • FIFO consistency: writes performed by a single thread are seen by all
other processes in the order in which they were issued
Consistency Model without Transactions
• Consistency in Partitioned tables – a partitioned table row owned by one member at a point in time – all updates are serialized to replicas through owner – "Total ordering" at a row level: atomic and isolated
• Membership changes and consistency – need another hour J
• Pessimistic concurrency support using ‘Select for update’ • Support for referential integrity
Consistency Model without Transactions
Distributed Transactions
• Full support for distributed transactions • Support READ_COMITTED and REPEATABLE_READ
• Highly scalable without any centralized coordinator or lock manager • We make some important assumptions
• Most OLTP transactions are small in duration and size
• W-W conflicts are very rare in practice
• How does it work? • Each data node has a sub-coordinator to track TX state • Eagerly acquire local “write” locks on each replica
• Object owned by a single primary at a point in time • Fail fast if lock cannot be obtained
• Atomic and works with the cluster Failure detection system • Isolated until commit for READ_COMMITTED
• Only support local isolation during commit
Distributed Transactions
GFXD Performance benchmark In-memory
How does it perform? Scale?
• Scale from 2 to 10 servers (one per host) • Scale from 200 to 1200 simulated clients (10 hosts) • Single partitioned table: int PK, 40 fields (20 ints, 20 strings)
How does it perform? Scale?
• CPU% remained low per server – about 30% indicating many more clients could be handled
Is latency low with scale?
• Latency decreases with server capacity • 50-70% take < 1 millisecond • About 90% take less than 2 milliseconds