Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Couchase and Hadoop

Perry Krug

Sr. Solutions Architect

Agenda• View basics

• Lifecycle of a view

• Index definition, build, and query phase

• Indexing details

• Replica indexes, failover and compaction

• Primary and Secondary indexes

• View best practices

• Couchbase and Elastic Search

• Couchbase and Hadoop

pol·y·glot / päli glät/ˈ ˌAdjective: Knowing or using several languages.Noun: A person who knows several languages.Synonyms: multilingual

per·sist·ence /p r sist ns/ə ˈ əNoun: The continued or prolonged existence

of something.Synonyms: perseverance - tenacity - pertinacity –

stubbornness

Couchbase Views – The basics• Define materialized views on JSON documents and then query

across the data set

• Using views you can define• Primary indexes

• Simple secondary indexes (most common use case)

• Complex secondary, tertiary and composite indexes

• Aggregations (reduction)

• Indexes are eventually indexed

• Queries are eventually consistent with respect to documents

• Built using Map/Reduce technology • Map and Reduce functions are written in Javascript

View LifecycleDefine -> Build -> Query

5

Buckets & Design docs & Views•C

reate design documents on a bucket

•Create views within a design documentBUCKET 1

Design document 1

View 1View 1

View 2View 2

View 3View 3

Design document 2

View 4View 4

View 5View 5

Design document 3

View 6View 6

View 7View 7

BUCKET 2

Couchbase Server Cluster

Distributed Indexing and Querying

User Configured Replica Count = 1

Active

Doc 5

Doc 2

Doc

Doc

Doc

Server 1

REPLICA

Doc 3

Doc 1

Doc 7

Doc

Doc

Doc

App Server 1

COUCHBASE Client LibraryCOUCHBASE Client Library

Cluster Map

COUCHBASE Client LibraryCOUCHBASE Client Library

Cluster Map

App Server 2

Doc 9

• Indexing work is distributed amongst nodes

• Parallelize the effort

• Each node has index for data stored on it

• Queries combine the results from required nodes

Active

Doc 3

Doc 1

Doc

Doc

Doc

Server 2

REPLICA

Doc 6

Doc 4

Doc 9

Doc

Doc

Doc

Doc 8

Active

Doc 4

Doc 6

Doc

Doc

Doc

Server 3

REPLICA

Doc 2

Doc 5

Doc 8

Doc

Doc

Doc

Doc 7

Query

Create Index / View

3333 22

Eventually indexed Views – Data flow2

Managed Cache

Dis

k Q

ueue

Disk

Replication Queue

App Server

Couchbase Server Node

Doc 1Doc 1

Doc 1

To other node

View engine

Doc 1

DEFINE Index / View Definition in JavaScript

CREATE INDEX City ON Brewery.City;

BUILD Distributed Index Build Phase

• Optimized for lookups, in-order access and aggregations

• View reads are from disk (different performance profile than GET/SET)

• Views built against every document on every node

Group them in a design document

• Views are automatically kept up to date

QUERY Dynamic Queries with Optional Aggregation

• Eventually consistent with respect to document updates• Efficiently fetch a document or group of similar documents • Queries will use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries

Query ?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}

Simple Primary and Secondary Indexing

Example Document Document

ID

Define a primary index on the bucket• Lookup the document ID / key by key, range, prefix, suffix

Index definition

Define a secondary index on the bucket

• Lookup an attribute by value, range, prefix, suffix

Index definition

Find documents by a specific attribute

• Lets find beers by brewery_id!

The index definition

ValueKey

The result set: beers keyed by brewery_id

Query PatternBasic Aggregations

Use a built-in reduce function with a group query

• Lets find average abv for each brewery!

Group reduce (reduce by unique key)

Query PatternTime-based Rollups

Find patterns in beer comments by time

{ "type": "comment", "about_id": "beer_Enlightened_Black_Ale", "user_id": 525, "text": "tastes like college!", "updated": "2010-07-22 20:00:20"}{ "id": "f1e62"}

timestamp

Query with group_level=2 to get monthly rollups

group_level=3 - daily results - great for graphing

Query PatternLeaderboard

Aggregate value stored in a document• Lets find the top-rated beers!

{ "brewery": "New Belgium Brewing", "name": "1554 Enlightened Black Ale", "abv": 5.5, "description": "Born of a flood...", "category": "Belgian and French Ale", "style": "Other Belgian-Style Ales", "updated": "2010-07-22 20:00:20", “ratings” : { “jchris” : 5, “scalabl3” : 4, “damienkatz” : 1 }, “comments” : [ “f1e62”, “6ad8c” ]}

ratings

Sort each beer by its average rating• Lets find the top-rated beers!

average

Couchbase and Elastic Search

Full Text Search

{ "name": "Abbey Belgian Style Ale", "description": "Winner of four World Beer Cup medals and eight medals at the Great American Beer Fest, Abbey Belgian Ale is the Mark Spitz of New Belgium’s lineup – but it didn’t start out that way."}

Search Across Full JSON Body

Search term: abbey

{ "name": "Abbey Belgian Style Ale", "description": "Winner of four World Beer Cup medals and eight medals at the Great American Beer Fest, Abbey Belgian Ale is the Mark Spitz of New Belgium’s lineup – but it didn’t start out that way."}

Search Across Full JSON Body

Search term: abbey

Faceted Search

Categories

Items with Counts

Range Facets

Learning Portal – Proof of Concept

Couchbase and Hadoop

Cloudera, etc.

Operational vs. Analytic Databases

Couchbase

AnalyticAnalyticDatabasesDatabases

Get insights from Get insights from datadata

Real-time, Real-time, Interactive DatabasesInteractive Databases

Fast access Fast access to datato data

NoSQL

What is Sqoop?

Sqoop is a tool designed to transfer data between Hadoop and [OLTP] databases. You can use Sqoop to import data from [an OLTP] database management system (RDBMS) such as MySQL or Oracle [or Couchbase] into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back.

sqoop.apache.org

Traditional ETL

Application DataData

T

What is Sqoop?

A different paradigm

Data

ApplicationData

What is Sqoop?

A very scalable different paradigm

Data

Application

Data

Application

Data

Application

Data

Where did the Transform go?

Application

Data

TTT TTT TTT TTT

What is Sqoop?

Couchbase Import and Export

$ sqoop import –-connect http://localhost:8091/pools --table DUMP

$ sqoop import –-connect http://localhost:8091/pools --table BACKFILL_5

$ sqoop export --connect http://localhost:8091/pools

--table DUMP –export-dir DUMP

•For Imports, table must be:– DUMP: All keys currently in Couchbase– BACKFILL_n: All key mutations for n minutes

•Specified –username maps to bucket– By default set to “default” bucket

Hadoop and Couchbase – Ad Targeting

click streamevents

profiles, campaigns

profiles, real time campaign statistics

40 milliseconds to respond with the decision.

2

3

1

Moving Parts

Content & Recommendation Targeting

Moving Parts

Thank you

Couchbase NoSQL Document Database

Date post:	17-Jul-2015
Category:	Business
Upload:	couchbase
View:	213 times
Download:	0 times

Couchbase_John_Bryce_Israel_Training_couchbase_hadoop

Business