+ All Categories
Home > Documents > Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project...

Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project...

Date post: 05-Jan-2016
Category:
Upload: amberly-cole
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
Introduction of HBase Reporter: Hu Yi 2009-3-11
Transcript
Page 1: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Introduction of HBase

Reporter: Hu Yi

2009-3-11

Page 2: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Overview

HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing Environment.

Data is logically organized into tables, rows and columns.

Page 3: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Outline

Data Model Architecture and Implementation Examples & Tests

Page 4: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Conceptual View

A data row has a sortable row key and an arbitrary number of columns.

A Time Stamp is designated automatically if not artificially.

<family>:<label>

Row keyTime

Stamp

Column“contents:

”Column “anchor:”

“com.apache.www”

t12 “<html>…”

t11 “<html>…”

t10“anchor:apache.

com”“APACHE

“com.cnn.www”

t15“anchor:cnnsi.co

m”“CNN”

t13“anchor:my.look.c

a”“CNN.co

m”

t6 “<html>…”

t5 “<html>…”

t3 “<html>…”

<family>:<label>

Page 5: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Physical Storage View

Physically, tables are stored on a per-column family basis.

Empty cells are not stored in a column-oriented storage format.

Each column family is managed by an HStore.

Row key TSColumn

“contents:”

“com.apache.www”

t12 “<html>…”

t11 “<html>…”

“com.cn.www”

t6 “<html>…”

t5 “<html>…”

t3 “<html>…”

Row key TS Column “anchor:”

“com.apache.www” t10

“anchor:apache.com”

“APACHE”

com.cn.www”

t9“anchor:

cnnsi.com”“CNN”

t8“anchor:

my.look.ca”“CNN.co

m”

HStore

Data MapFile

Index MapFile

Key/Value

Index key

HStore

Memcache

Page 6: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Row Ranges: Regions

Row key/ Column ascending, Timestamp descending

Physically, tables are broken into row ranges contain rows from start-key to end-key

Row keyTime

StampColumn

“contents:”Column “anchor:”

aaaa

t15 anchor:cc value

t13 ba

t12 bb

t11 anchor:cd value

t10 bc

aaab t14

aaac anchor:be value

aaad anchor:ad value

aaaet5 ae

t3 af

Page 7: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Outline

Data Model Architecture and Implementation Examples & Tests

Page 8: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Three major components

The HBaseMaster

The HRegionServer

The HBase client

Page 9: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HBaseMaster

Assign regions to HRegionServers.

1. ROOT region locates all the META regions.

2. META region maps a number of user regions.

3. Assign user regions to the HRegionServers.

Enable/Disable table and change table schema

Monitor the health of each Server

ROOT Regi on

META Regi on

META Regi on

USER Regi on

USER Regi on

USER Regi on

Page 10: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

ROOT/META Table

Each row in the ROOT and META tables is approximately 1KB in size. At the default size of 256MB.

18

18 18

54 64

1 2

2 2

2 2

ROOTtable METAregions

USERregions

KB bytes

224TB

Page 11: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

write

Hstore1 Hstore2

Memcache1

HLog

Row keyTimeStam

p

Column“contents

:”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACH

E”

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

Memcache2

Mapfile1.1

Mapfile1.2

Page 12: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

Read

Hstore1

Memcache1

Mapfile1.1

Mapfile1.2

Row keyTimeStam

p

Column“contents:

”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACHE

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

Page 13: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

Cache Flushes

Hstore1

Memcache1

Mapfile1.1

Mapfile1.2

HLog

Row keyTimeStam

p

Column“contents:

”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACHE

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

Mapfile1.1

Mapfile1.2

Mapfile1.3

Page 14: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

Compactions

Hstore1

Memcache1

Mapfile1.1

Mapfile1.2Mapfile1

Row keyTimeStam

p

Column“contents:

”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACHE

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

Page 15: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HRegionServer

Write Requests Read Requests Cache Flushes Compactions Region Splits

Region Splits

Hstore1

Memcache1

Mapfile1

Row keyTimeStam

p

Column“contents

:”Column “anchor:”

“com.apache.ww

w”

t12“<html>…”

t11“<html>…”

t10“anchor:apache.

com”“APACH

E”

“com.cnn.www”

t9“anchor:cnnsi.co

m”“CNN”

t8“anchor:my.look.

ca”“CNN.co

m”

t6“<html>

…”

t5“<html>

…”

t3“<html>

…”

Page 16: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HBase Client

Page 17: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HBase Client ROOT Region

Page 18: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HBase Client

META Region

Page 19: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

HBase Client User Region

Information cached

Page 20: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Outline

Data Model Architecture and Implementation Examples & Tests

Page 21: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Create MyTable

HBaseAdmin admin= new HBaseAdmin(config);HColumnDescriptor []column;column= new HColumnDescriptor[2];column[0]=new HColumnDescriptor("columnFamily1:");column[1]=new HColumnDescriptor("columnFamily2:");HTableDescriptor desc= new HTableDescriptor(Bytes.toByt

es("MyTable"));desc.addFamily(column[0]);desc.addFamily(column[1]);admin.createTable(desc);

Row Key

Timestamp

columnFamily1:

columnFamily2:

Page 22: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Insert Values

BatchUpdate batchUpdate = new BatchUpdate("myRow",timestamp);

batchUpdate.put("columnFamily1:labela",Bytes.toBytes("labela value"));

batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“labelb value"));

table.commit(batchUpdate);

Row Key

Timestamp columnFamily1:

myRow 

ts1 labela labela value

ts2labelb

  labelb value

Page 23: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

I nsert

0

20000

40000

60000

80000

100000

120000

140000

160000

100000 10000 1000 100 10 1

1 10 100 1000 10000 100000

Hbase

Page 24: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Insert

1

10

100

1000

10000

100000

1000000

10 100

1000

1000

0

1000

00

Row*10 Column=1

time

(ms)

HbaseMySQL

Page 25: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Search

Row keyTime

StampColumn “anchor:”

“com.apache.www”

t12

t11

t10 “anchor:apache.com” “APACHE”

“com.cnn.www”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”

t6

t5

t3

Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’

Page 26: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Search ScannerSelect value from table where anchor=‘cnnsi.com’

Row keyTime

StampColumn “anchor:”

“com.apache.www”

t12

t11

t10 “anchor:apache.com” “APACHE”

“com.cnn.www”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”

t6

t5

t3

Page 27: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Summary

Column-oriented modification more flexible.

Higher performance on row key clusters.

Page 28: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Future work

More test work

Optimization on search

Page 29: Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Thank you


Recommended