Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project...

Post on 05-Jan-2016

212 views 0 download

Tags:

transcript

Introduction of HBase

Reporter: Hu Yi

2009-3-11

Overview

HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing Environment.

Data is logically organized into tables, rows and columns.

Outline

Data Model Architecture and Implementation Examples & Tests

Conceptual View

A data row has a sortable row key and an arbitrary number of columns.

A Time Stamp is designated automatically if not artificially.

<family>:<label>

Row keyTime

Stamp

Column“contents:

”Column “anchor:”

“com.apache.www”

t12 “<html>…”

t11 “<html>…”

t10“anchor:apache.

com”“APACHE

“com.cnn.www”

t15“anchor:cnnsi.co

m”“CNN”

t13“anchor:my.look.c

a”“CNN.co

m”

t6 “<html>…”

t5 “<html>…”

t3 “<html>…”

<family>:<label>

Physical Storage View

Physically, tables are stored on a per-column family basis.

Empty cells are not stored in a column-oriented storage format.

Each column family is managed by an HStore.

Row key TSColumn

“contents:”

“com.apache.www”

t12 “<html>…”

t11 “<html>…”

“com.cn.www”

t6 “<html>…”

t5 “<html>…”

t3 “<html>…”

Row key TS Column “anchor:”

“com.apache.www” t10

“anchor:apache.com”

“APACHE”

com.cn.www”

t9“anchor:

cnnsi.com”“CNN”

t8“anchor:

my.look.ca”“CNN.co

m”

HStore

Data MapFile

Index MapFile

Key/Value

Index key

HStore

Memcache

Row Ranges: Regions

Row key/ Column ascending, Timestamp descending

Physically, tables are broken into row ranges contain rows from start-key to end-key

Row keyTime

StampColumn

“contents:”Column “anchor:”

aaaa

t15 anchor:cc value

t13 ba

t12 bb

t11 anchor:cd value

t10 bc

aaab t14

aaac anchor:be value

aaad anchor:ad value

aaaet5 ae

t3 af

Outline

Data Model Architecture and Implementation Examples & Tests

Three major components

The HBaseMaster

The HRegionServer

The HBase client

HBaseMaster

Assign regions to HRegionServers.

1. ROOT region locates all the META regions.

2. META region maps a number of user regions.

3. Assign user regions to the HRegionServers.

Enable/Disable table and change table schema

Monitor the health of each Server

ROOT Regi on

META Regi on

META Regi on

USER Regi on

USER Regi on

USER Regi on

ROOT/META Table

Each row in the ROOT and META tables is approximately 1KB in size. At the default size of 256MB.

18

18 18

54 64

1 2

2 2

2 2

ROOTtable METAregions

USERregions

KB bytes

224TB

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

write

Hstore1 Hstore2

Memcache1

HLog

Row keyTimeStam

p

Column“contents

:”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACH

E”

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

Memcache2

Mapfile1.1

Mapfile1.2

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

Read

Hstore1

Memcache1

Mapfile1.1

Mapfile1.2

Row keyTimeStam

p

Column“contents:

”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACHE

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

Cache Flushes

Hstore1

Memcache1

Mapfile1.1

Mapfile1.2

HLog

Row keyTimeStam

p

Column“contents:

”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACHE

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

Mapfile1.1

Mapfile1.2

Mapfile1.3

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

Compactions

Hstore1

Memcache1

Mapfile1.1

Mapfile1.2Mapfile1

Row keyTimeStam

p

Column“contents:

”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACHE

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

Region Splits

Hstore1

Memcache1

Mapfile1

Row keyTimeStam

p

Column“contents

:”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACH

E”

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

HBase Client

HBase Client ROOT Region

HBase Client

META Region

HBase Client User Region

Information cached

Outline

Data Model Architecture and Implementation Examples & Tests

Create MyTable

HBaseAdmin admin= new HBaseAdmin(config);HColumnDescriptor []column;column= new HColumnDescriptor[2];column[0]=new HColumnDescriptor("columnFamily1:");column[1]=new HColumnDescriptor("columnFamily2:");HTableDescriptor desc= new HTableDescriptor(Bytes.toByt

es("MyTable"));desc.addFamily(column[0]);desc.addFamily(column[1]);admin.createTable(desc);

Row Key

Timestamp

columnFamily1:

columnFamily2:

Insert Values

BatchUpdate batchUpdate = new BatchUpdate("myRow",timestamp);

batchUpdate.put("columnFamily1:labela",Bytes.toBytes("labela value"));

batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“labelb value"));

table.commit(batchUpdate);

Row Key

Timestamp columnFamily1:

myRow 

ts1 labela labela value

ts2labelb

  labelb value

I nsert

0

20000

40000

60000

80000

100000

120000

140000

160000

100000 10000 1000 100 10 1

1 10 100 1000 10000 100000

Hbase

Insert

1

10

100

1000

10000

100000

1000000

10 100

1000

1000

0

1000

00

Row*10 Column=1

time

(ms)

HbaseMySQL

Search

Row keyTime

StampColumn “anchor:”

“com.apache.www”

t12

t11

t10 “anchor:apache.com” “APACHE”

“com.cnn.www”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”

t6

t5

t3

Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’

Search ScannerSelect value from table where anchor=‘cnnsi.com’

Row keyTime

StampColumn “anchor:”

“com.apache.www”

t12

t11

t10 “anchor:apache.com” “APACHE”

“com.cnn.www”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”

t6

t5

t3

Summary

Column-oriented modification more flexible.

Higher performance on row key clusters.

Future work

More test work

Optimization on search

Thank you