Couchdoop: Connecting Hadoop with Couchbase (London HUG)

Post on 28-May-2015

414 views 2 download

Tags:

description

We needed a bridge between the real-time tier, where we used Couchbase, and the batch tier, built on Hadoop. When we couldn’t find a suitable option, we built our own: Couchdoop – an open-source Hadoop connector for Couchbase. Based on our experience with Couchdoop, we will discuss best practices in creating connectors for Hadoop and NoSQL DBs. We’ll address the challenges we encountered while developing Couchdoop and share how we tuned it for performance. Together with Bigstep, we will also show how much throughput that can be squeezed from a Hadoop connector. We have benchmarked Couchdoop for performance and we’ll talk about the behavior you can expect and tweaks that can improve the performance of your big data setup.

transcript

Two-tier Architecture

Real-time Tier (Couchbase)•Detects user intent•Gives next best recommendation or deal

Data Bridge (Couchdoop)

Batch Tier (Hadoop)•Recommends products

Use

r even

ts

Reco

mm

en

datio

ns

Importing Data{ “user”: “Rudy”, “action”: “view”, “product”: “Fender Guitar”}

{ “user”: “Rudy”, “action”: “click”, “product”: “Guitar Amplifier”} {

“user”: “Emma”, “action”: “buy”, “product”: “Blue Skirt”}

Couchdoop

Machine Learning RecommendationsHadoop

IMPORT

HDFS

{ “user”: “Rudy”, “recommendations”: [ [“Ibanez Acoustic Guitar”, 450], [“Guitar Tuner”, 120], [“Sound Mixer”, 30] ]}

EXPORT

Exporting Data

Couchdoop

Machine Learning RecommendationsHadoop

{ “user”: “Rudy”, “recommendations”: [ [“Ibanez Acoustic Guitar”, 450], [“Guitar Tuner”, 120], [“Sound Mixer”, 30] ]}

Update

Updating Data

Couchdoop

Machine Learning RecommendationsHadoop