HBase, crazy dances on the elephant back.

Roman Nikitchenko, 16.10.2014

Crazy dances on the elephant back

ApacheHBase

2www.vitech.com.ua

YARN

3www.vitech.com.ua

10.000 nodes computer... Recent technology changes are focused on higher scale. Better resource usage and control, lower MTTR, higher security, redundancy, fault tolerance.

FIRST EVER DATA OS

4www.vitech.com.ua

● Hadoop is open source framework for big data. Both distributed storage and processing.

● Hadoop is reliable and fault tolerant with no rely on hardware for these properties.

● Hadoop has unique horisontal scalability. Currently — from single computer up to thousands of cluster nodes.

5www.vitech.com.ua Why hadoop?

x MAX+

=

BIG DATA

BIG DATA

BIG DATA

BIG DATA

BIG DATA

BIG DATA

BIG DATA

BIG DATA

BIG DATA

BIG DATA

What is HADOOP INDEED?

6www.vitech.com.uaHBase

motivation

● Hadoop is designed for throughput, not for latency.

● HDFS blocks are expected to be large. There is issue with lot of small files.

● Write once, read many times ideology.

● MapReduce is not so flexible so any database built on top of it.

● How about realtime?

Beware...

7www.vitech.com.uaHBase

motivation

BUT WE OFTEN NEED...

LATENCY, SPEED and all Hadoop properties.

8www.vitech.com.ua Agenda

It's all not only about Hbase.

Architecture, data model, features.

HBASE as is

INTEGRATION

But we are always special, don't you?

Something special

9www.vitech.com.ua

● Open source Google BigTable implementation with appropriate infrastructure place.

● Limited but strict ACID guarantees.

● Realtime, low latency, linear scalability.● Distributed, reliable and fault tolerant.● Natural integration with Hadoop infrastructure.● Really good for massive scans.● Server side user operations.

● No any SQL.● Secondary indexing is pretty complex.

MANIFEST

10www.vitech.com.ua

High layer applications

Resource management

Distributed file system

YARN

11www.vitech.com.ua

KEY USERS

12www.vitech.com.ua

HBase: the story begins with ...

2006 2007 2008 2009 2010 … 2014 … future

2008, HBase goes OLTP (online transaction processing). 0.20 is first performance release

2010, HBase becomes Apache top-level project

HBase 0.92 is considered production ready release

November 2010, Facebook elected HBase to implement

new messaging platform

2007, First code is released as part of

Hadoop 0.15. Focus is on offline, crawl data storage

2006, Google BigTable paper is published. HBase

development starts

13www.vitech.com.ua

Loose data structure

Book: title, author, pages, price

Ball: color, size, material, price

Toy car: color, type, radio control, price

Kind Price Title Author Pages Color Size Material Type Radio control

Book + + + +

Ball + + + +

Toy car + + + +

● Data looks like tables with large number of columns.

● Columns set can vary from row to row.

● No table modification is needed to add column to row.

Book #1: Kind, Price, Title, Author, Pages

Ball #1: Kind, Price, Color, Size, Material

Toy car #1: Price, Color, Type +Radio control

Book #2: Kind, Price, Title, Author

HBase: it is NoSQL

14www.vitech.com.ua

Table

Region

Region

Row

Key Family #1 Family #2 ...Column Column ... ...

...

...

...

Data is placed in tables.

Tables are split into regions based on row key ranges.

Columns are grouped into families.Every table row

is identified by unique row key.

Every row consists of columns.

Logical data model

15www.vitech.com.ua

Table

Region

RegionRow

Key Family #1 Family #2 ...Column Column ... ...

...

● Data is stored in HFile.● Families are stored on

disk in separate files.● Row keys are

indexed in memory.● Column includes key,

qualifier, value and timestamp.● No column limit.● Storage is block based.

HFile: family #1

Row key Column Value TS

... ... ... ...

... ... ... ...

HFile: family #2

Row key Column Value TS

... ... ... ...

... ... ... ...

● Delete is just another marker record.

● Periodic compaction is required.

Real data model

16www.vitech.com.ua

DATA

META

RS RS RS RS

ClientMasterZookeeper

Zookeeper coordinates distributed elements and is primary contact point for client.

Master server keeps metadata and manages data distribution over Region servers.

Region servers manage data table regions.

Clients directly communicate with region server for data.

Clients locate master through ZooKeeper then needed regions through master.

Hbase: infrastructure view

17www.vitech.com.ua

DATA

META

Rack

DN DN

RS RS

Rack

DN DN

RS RS

Rack

DN DN

RS RSNameNode

Client

MasterZookeeper

Zookeeper coordinates distributed elements and is primary contact point for client.

Master server keeps metadata and manages data distribution over Region servers.

Region servers manage data table regions.

Actual data storage service including replication is on HDFS data nodes.

Clients directly communicate with region server for data.

Clients locate master through ZooKeeper then needed regions through master.

Together with HDFS

18www.vitech.com.ua

KEY

OPERATIONS

GET

PUT

DELETE

SCAN

No difference if we add data or replace existing one.

Get data eleent by key: rows, columns.

Massive GET with key range.

DELETE single object

BATCH OPERATIONS ARE POSSIBLE

19www.vitech.com.ua

CLOSER VIEW

20www.vitech.com.ua

● Actual write is to region server. Master is not involved.● All requests are coming to WAL (write ahead log) to

provide recovery.● Region server keeps MemStore as temporary storage.● Only when needed write is flushed to disk (into HFile).

21www.vitech.com.ua

CRUD: Put and Delete

● Lower layer is WRITE ONLY filesystem (HDFS). So both PUT and DELETE path is identical. DELETE is just another marker added.

● Both PUT and DELETE requests are per row key. No row key range for DELETE.

● Actual DELETE is performed during compactions.

WHY IS IT FAST?Memory is intensively used. Writes are logged and cached in memory. Reads are just cached.

22www.vitech.com.ua

CRUD: Get and Scan

● Both Get and Scan can include client filters — expressions that are processed on server side and can seriously limit results so traffic.

● Both Scan and Get operations can be performed on several column families.

Get operation is implemented through Scan.

● Get operation is simple data request by row key.

● Scan operation is performed based on row key range which could involve several table regions.

23www.vitech.com.ua

● RegionObserver can attach code to operations on region level.

● Similar functionality exists for Master.● Endpoints is the way to provide functionality

equal to stored procedure.● Together coprocessor infrastructure can bring

realtime distributed processing framework (lightweight MapReduce).

● Coprocessors is feature that allows to extend HBase without product code modification.

SERVER SIDE TRICKS

24www.vitech.com.ua

Request

Coprocessors:Region observer

Client

Table

Region observer Region observer

Result

Region Region

RegionServer RegionServer

Region observer works like hook on region operations. Region observer Region observerRegion observer Region observer

Region observers can be stacked.

25www.vitech.com.ua

RegionServer RegionServer

Coprocessors:Endpoints

Request (RPC)

Client Table

Region Region

Direct communication via separate protocol.

Response

Endpoint Endpoint

Your commands can have effect on

table regions.

26www.vitech.com.ua

WHY SERVER SIDE IS BLACK MAGIC?

YOU ARE MODIFYING REGION SERVER OR MASTER CODE

ANY MISTAKELEADS TO HELL

JAVA CLASS LOADER REQUIRES SERVICE RESTART ON RELOAD

ANY MODIFICATIONLEADS TO HELL

27www.vitech.com.ua

Integration with MapReduce

INTEGRATION

28www.vitech.com.ua

DATA

META

Integration with MapReduce

● HBase provides number of classes for native MapReduce integration. Main point is data locality.

● TableInputFormat allows massive MapReduce table processing (maps table with one region per mapper).

● HBase classes like Result (Get / Scan result) or Put (Put request) can be passed between MapReduce job stages.

● Not so much difference between MR1 and YARN here.

DataNode

NameNodeJobTracker TaskTracker

RegionServerHMaster Ofen single node so data is local

MAP+REDUCE + HBASE

29www.vitech.com.ua

Bulk load

● Hbase table data is mapped. One mapper per table region so mapped data are processed locally.

● After local (!) mapping data is reduced. This can be non-local processing but it is much more light.

● So we receive almost 100% distributed local data processing around the Hadoop cluster.

HBasetable

Mapper

Mapper

Mapper

Table region

Table region

Table region

Mappers Reducers

MAP REDUCE CLASSICS

Reducer

30www.vitech.com.ua

Bulk load

● There is ability to load data in table MUCH FASTER.

● Hbase internal storage files (HFile) are prepared.

● It is preferable to generate one HFile per table region. MapReduce can be used.

● Prepared HFile is merged with table storage on maximum speed.

Dataimporters

HFile generator

HFile generator

HFile generator

Table region

Table region

Table region

Mappers Reducers

HFile

HFile

HFile

BULK LOAD

31www.vitech.com.ua

● HBase has no secondary indexing out-of-the-box.

● Coprocessor (RegionObserver) is used to track Put and Delete operations and update index table.

● Scan operations with index column filter are intercepted and processed based on index table content.

Table

ClientIndextable

RegionobserverPut / Delete Index update

Scan with filter

Region

Index search

SECONDARY INDEX THROUGH COPROCESSORS

32www.vitech.com.ua

● SOLR indexes documents. What is stored into SOLR index is not what you index. SOLR is NOT A STORAGE, ONLY INDEX

● But it can index ANYTHING. Search result is document ID

INDEX UPDATE

Search responses

INDEX QUERY

Index update request is analyzed, tokenized,

transformed... and the same is for queries.

INDEX ALTERNATIVE: SOLR

33www.vitech.com.ua

● HBase handles user data change online requests.

● NGData Lily indexer handles stream of changes and transforms them into SOLR index change requests.

● Indexes are built on SOLR so HBase data are searchable.

34www.vitech.com.ua

HDFS

HBase: Data and search integration

HBase regions

Data update

Client

User just puts (or deletes) data.

Search responses

Lily HBase NRT indexer

Replication can be set up to column

family level.

REPLICATIONHBasecluster

Translates data changes into SOLR

index updates.

SOLR cloudSearch requests (HTTP)

Apache Zookeeper does all coordination

Finally provides search

Serves low level file system.

35www.vitech.com.ua

Questions and discussion

Date post:	01-Jul-2015
Category:	Technology
Upload:	roman-nikitchenko
View:	197 times
Download:	0 times

HBase, crazy dances on the elephant back.

Technology