Date post: | 01-Jul-2015 |
Category: |
Technology |
Upload: | roman-nikitchenko |
View: | 197 times |
Download: | 0 times |
Roman Nikitchenko, 16.10.2014
Crazy dances on the elephant back
ApacheHBase
2www.vitech.com.ua
YARN
3www.vitech.com.ua
10.000 nodes computer... Recent technology changes are focused on higher scale. Better resource usage and control, lower MTTR, higher security, redundancy, fault tolerance.
FIRST EVER DATA OS
4www.vitech.com.ua
● Hadoop is open source framework for big data. Both distributed storage and processing.
● Hadoop is reliable and fault tolerant with no rely on hardware for these properties.
● Hadoop has unique horisontal scalability. Currently — from single computer up to thousands of cluster nodes.
5www.vitech.com.ua Why hadoop?
x MAX+
=
BIG DATA
BIG DATA
BIG DATA
BIG DATA
BIG DATA
BIG DATA
BIG DATA
BIG DATA
BIG DATA
BIG DATA
What is HADOOP INDEED?
6www.vitech.com.uaHBase
motivation
● Hadoop is designed for throughput, not for latency.
● HDFS blocks are expected to be large. There is issue with lot of small files.
● Write once, read many times ideology.
● MapReduce is not so flexible so any database built on top of it.
● How about realtime?
Beware...
7www.vitech.com.uaHBase
motivation
BUT WE OFTEN NEED...
LATENCY, SPEED and all Hadoop properties.
8www.vitech.com.ua Agenda
It's all not only about Hbase.
Architecture, data model, features.
HBASE as is
INTEGRATION
But we are always special, don't you?
Something special
9www.vitech.com.ua
● Open source Google BigTable implementation with appropriate infrastructure place.
● Limited but strict ACID guarantees.
● Realtime, low latency, linear scalability.● Distributed, reliable and fault tolerant.● Natural integration with Hadoop infrastructure.● Really good for massive scans.● Server side user operations.
● No any SQL.● Secondary indexing is pretty complex.
MANIFEST
10www.vitech.com.ua
High layer applications
Resource management
Distributed file system
YARN
11www.vitech.com.ua
KEY USERS
12www.vitech.com.ua
HBase: the story begins with ...
2006 2007 2008 2009 2010 … 2014 … future
2008, HBase goes OLTP (online transaction processing). 0.20 is first performance release
2010, HBase becomes Apache top-level project
HBase 0.92 is considered production ready release
November 2010, Facebook elected HBase to implement
new messaging platform
2007, First code is released as part of
Hadoop 0.15. Focus is on offline, crawl data storage
2006, Google BigTable paper is published. HBase
development starts
13www.vitech.com.ua
Loose data structure
Book: title, author, pages, price
Ball: color, size, material, price
Toy car: color, type, radio control, price
Kind Price Title Author Pages Color Size Material Type Radio control
Book + + + +
Ball + + + +
Toy car + + + +
● Data looks like tables with large number of columns.
● Columns set can vary from row to row.
● No table modification is needed to add column to row.
Book #1: Kind, Price, Title, Author, Pages
Ball #1: Kind, Price, Color, Size, Material
Toy car #1: Price, Color, Type +Radio control
Book #2: Kind, Price, Title, Author
HBase: it is NoSQL
14www.vitech.com.ua
Table
Region
Region
Row
Key Family #1 Family #2 ...Column Column ... ...
...
...
...
Data is placed in tables.
Tables are split into regions based on row key ranges.
Columns are grouped into families.Every table row
is identified by unique row key.
Every row consists of columns.
Logical data model
15www.vitech.com.ua
Table
Region
RegionRow
Key Family #1 Family #2 ...Column Column ... ...
...
● Data is stored in HFile.● Families are stored on
disk in separate files.● Row keys are
indexed in memory.● Column includes key,
qualifier, value and timestamp.● No column limit.● Storage is block based.
HFile: family #1
Row key Column Value TS
... ... ... ...
... ... ... ...
HFile: family #2
Row key Column Value TS
... ... ... ...
... ... ... ...
● Delete is just another marker record.
● Periodic compaction is required.
Real data model
16www.vitech.com.ua
DATA
META
RS RS RS RS
ClientMasterZookeeper
Zookeeper coordinates distributed elements and is primary contact point for client.
Master server keeps metadata and manages data distribution over Region servers.
Region servers manage data table regions.
Clients directly communicate with region server for data.
Clients locate master through ZooKeeper then needed regions through master.
Hbase: infrastructure view
17www.vitech.com.ua
DATA
META
Rack
DN DN
RS RS
Rack
DN DN
RS RS
Rack
DN DN
RS RSNameNode
Client
MasterZookeeper
Zookeeper coordinates distributed elements and is primary contact point for client.
Master server keeps metadata and manages data distribution over Region servers.
Region servers manage data table regions.
Actual data storage service including replication is on HDFS data nodes.
Clients directly communicate with region server for data.
Clients locate master through ZooKeeper then needed regions through master.
Together with HDFS
18www.vitech.com.ua
KEY
OPERATIONS
GET
PUT
DELETE
SCAN
No difference if we add data or replace existing one.
Get data eleent by key: rows, columns.
Massive GET with key range.
DELETE single object
BATCH OPERATIONS ARE POSSIBLE
19www.vitech.com.ua
CLOSER VIEW
20www.vitech.com.ua
● Actual write is to region server. Master is not involved.● All requests are coming to WAL (write ahead log) to
provide recovery.● Region server keeps MemStore as temporary storage.● Only when needed write is flushed to disk (into HFile).
21www.vitech.com.ua
CRUD: Put and Delete
● Lower layer is WRITE ONLY filesystem (HDFS). So both PUT and DELETE path is identical. DELETE is just another marker added.
● Both PUT and DELETE requests are per row key. No row key range for DELETE.
● Actual DELETE is performed during compactions.
WHY IS IT FAST?Memory is intensively used. Writes are logged and cached in memory. Reads are just cached.
22www.vitech.com.ua
CRUD: Get and Scan
● Both Get and Scan can include client filters — expressions that are processed on server side and can seriously limit results so traffic.
● Both Scan and Get operations can be performed on several column families.
Get operation is implemented through Scan.
● Get operation is simple data request by row key.
● Scan operation is performed based on row key range which could involve several table regions.
23www.vitech.com.ua
● RegionObserver can attach code to operations on region level.
● Similar functionality exists for Master.● Endpoints is the way to provide functionality
equal to stored procedure.● Together coprocessor infrastructure can bring
realtime distributed processing framework (lightweight MapReduce).
● Coprocessors is feature that allows to extend HBase without product code modification.
SERVER SIDE TRICKS
24www.vitech.com.ua
Request
Coprocessors:Region observer
Client
Table
Region observer Region observer
Result
Region Region
RegionServer RegionServer
Region observer works like hook on region operations. Region observer Region observerRegion observer Region observer
Region observers can be stacked.
25www.vitech.com.ua
RegionServer RegionServer
Coprocessors:Endpoints
Request (RPC)
Client Table
Region Region
Direct communication via separate protocol.
Response
Endpoint Endpoint
Your commands can have effect on
table regions.
26www.vitech.com.ua
WHY SERVER SIDE IS BLACK MAGIC?
YOU ARE MODIFYING REGION SERVER OR MASTER CODE
ANY MISTAKELEADS TO HELL
JAVA CLASS LOADER REQUIRES SERVICE RESTART ON RELOAD
ANY MODIFICATIONLEADS TO HELL
27www.vitech.com.ua
Integration with MapReduce
INTEGRATION
28www.vitech.com.ua
DATA
META
Integration with MapReduce
● HBase provides number of classes for native MapReduce integration. Main point is data locality.
● TableInputFormat allows massive MapReduce table processing (maps table with one region per mapper).
● HBase classes like Result (Get / Scan result) or Put (Put request) can be passed between MapReduce job stages.
● Not so much difference between MR1 and YARN here.
DataNode
NameNodeJobTracker TaskTracker
RegionServerHMaster Ofen single node so data is local
MAP+REDUCE + HBASE
29www.vitech.com.ua
Bulk load
● Hbase table data is mapped. One mapper per table region so mapped data are processed locally.
● After local (!) mapping data is reduced. This can be non-local processing but it is much more light.
● So we receive almost 100% distributed local data processing around the Hadoop cluster.
HBasetable
Mapper
Mapper
Mapper
Table region
Table region
Table region
Mappers Reducers
MAP REDUCE CLASSICS
Reducer
30www.vitech.com.ua
Bulk load
● There is ability to load data in table MUCH FASTER.
● Hbase internal storage files (HFile) are prepared.
● It is preferable to generate one HFile per table region. MapReduce can be used.
● Prepared HFile is merged with table storage on maximum speed.
Dataimporters
HFile generator
HFile generator
HFile generator
Table region
Table region
Table region
Mappers Reducers
HFile
HFile
HFile
BULK LOAD
31www.vitech.com.ua
● HBase has no secondary indexing out-of-the-box.
● Coprocessor (RegionObserver) is used to track Put and Delete operations and update index table.
● Scan operations with index column filter are intercepted and processed based on index table content.
Table
ClientIndextable
RegionobserverPut / Delete Index update
Scan with filter
Region
Index search
SECONDARY INDEX THROUGH COPROCESSORS
32www.vitech.com.ua
● SOLR indexes documents. What is stored into SOLR index is not what you index. SOLR is NOT A STORAGE, ONLY INDEX
● But it can index ANYTHING. Search result is document ID
INDEX UPDATE
Search responses
INDEX QUERY
Index update request is analyzed, tokenized,
transformed... and the same is for queries.
INDEX ALTERNATIVE: SOLR
33www.vitech.com.ua
● HBase handles user data change online requests.
● NGData Lily indexer handles stream of changes and transforms them into SOLR index change requests.
● Indexes are built on SOLR so HBase data are searchable.
34www.vitech.com.ua
HDFS
HBase: Data and search integration
HBase regions
Data update
Client
User just puts (or deletes) data.
Search responses
Lily HBase NRT indexer
Replication can be set up to column
family level.
REPLICATIONHBasecluster
Translates data changes into SOLR
index updates.
SOLR cloudSearch requests (HTTP)
Apache Zookeeper does all coordination
Finally provides search
Serves low level file system.
35www.vitech.com.ua
Questions and discussion