NoSQL Couchbase Lite & BigData HPCC Systems

Post on 28-Jan-2018

573 views 4 download

transcript

Mobile Data with Couchbase Lite !&!

Big Data HPCC SystemsBy Fujio Turner

What is Couchbase Lite ?

What is Couchbase Lite ?

NoSQL JSON Document Database for Mobile

+Your Code

Embedded Database

Couchbase Lite 0.5 MB

Why do I need Couchbase Lite ?

Why do I need Couchbase Lite ?Mobile Myths:

1. Always Available 2. Always High Performing

The mobile network is:

How Couchbase Lite tackles the Mobile MythsLocal data is always faster

How Couchbase Lite tackles the Mobile MythsLocal data is always fasterI need to save the data non-locally

,but

How Couchbase Lite tackles the Mobile MythsLocal data is always fasterI need to save the data non-locally

I need to send data to another mobile devices

,but

and/or

EZ Data Syncing with !Couchbase Sync Gateway

https://github.com/couchbase/sync_gateway

Channels

{“data”:”yes”}• Authentication & Sessions • Definable channel rules via JavaScript

http(s):// REST server

How Sync Gateway Works

Written in:

Data Flow:

CRUD:

Who is using Couchbase Lite ?

HowUses Couchbase Litehttps://youtu.be/tYolHnbCavA

What BigData solution is ready for the next

20 plus years ?

LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,!

accounting and academic markets. !!

!

!

LexisNexis has been in business since 1977 with over 30,000 employees worldwide. 

What is HPCC Systems?Who is ?

LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.

Comparison

JAVA C++

Petabytes

1-80,000 Jobs/day

Since 2005

Exabytes

Since 2000

Indexed: 2K-3K Jobs/sec*

? ? ? ? ? ?

Thor Roxie

Block Based File Based

In-Memory: 30 - 40 Jobs/min*

Non-Indexed: 4-1,040,000 Jobs/day

 *based on job (size / result set / complexity)

“I’m sub-second fast.”

“I can query all or part of your

data.”

Thor RoxieSingle Threaded

Hard Disk Index(optional)

Multi-Threaded Hard Disk

Index(optional) In-memory

SSD

Either/Both

Architecture

BusinessDevelopmentCustomers1 20

Non-Indexed Full Data Set

http://hpccsystems.com/why-hpcc/benchmarks

300GB File

Kevin CA 45 Mark MI 27 Sara FL 64

Name State Age

How is Data Stored on !HPCC Systems ?!

Example

Customer Data May 2010

K.. CA 45 M.. MI 27 S.. FL 64

Thor Master

Thor Slaves

Kevin CA 45 Mark MI 27 Sara FL 64

Store Data

File Name ~/customers_2010-05

Data is distributed evenly in the cluster with replica copies and is seen as a file (example below).

K.. CA 45 M.. MI 27 S.. FL 64

Thor Master

Thor Slaves

Kevin CA 45 Mark MI 27 Sara FL 64

Store Data

Dali

File Location & Job Scheduler

File locations are stored on disk.

File Name ~/customers_2010-05

K CA 45 M MI 27 S FL 64Thor Master

Thor Slaves

Dali

What state do most people live in?

ESP

1a.

2.

File Location & Job Scheduler 1.a A pre-compiled query is triggered. (Mostly used in Roxie) 1b. Ad-hoc query. !2.Query is sent to Dali to get file locations.

1b.

K CA 45 M MI 27 S FL 64Thor Master

Thor Slaves

Dali

What state do most people live in?

ESP3.

File Location & Job Scheduler3. Job is placed in que to be sent to Thor Master. Thor Master coordinates job execution on Thor Slave nodes.

K CA 45 M MI 27 S FL 64Thor Master

Thor Slaves

Dali

What state do most people live in?

ESP

File Location & Job SchedulerJob are done locally on slaves and/or coordinated by master globally.

K CA 45 M MI 27 S FL 64Thor Master

Thor Slaves

Dali

What state do most people live in?

ESP

4.

4.

MI 500 CA 120 FL 7

File Location & Job Scheduler

4.Job is returned with optional grouped by & sorted by at run time.

K CA 45 M MI 27 S FL 64Thor Master

Thor Slaves

Dali

What state do most people live in?

ESP

MI 500 CA 120 FL 7

File Location & Job Scheduler

SORT!GROUP!DEDUP!JOIN!MERGE!BETWEEN!LENGTH!REGEX!ROUND!SUM!COUNT!TRIM!WHEN!AVE!CASE!NORMALIZE!DENORMALIZE!K-MEANS!more ….

Multiple other actions can be done on the data in a single job.

Sort

Count

Group

Classification

(ROXIE) 0.27 seconds to (THOR) few hours

Country = ‘US’

Join

Index of ~/facebook_2013

Query is Completed in a Single Job!Asynchronously

~/facebook_2013

Country = ‘US’

~/twitter_2013

optional

K CA 45 M MI 27 S FL 64Thor Master

Thor Slaves

Kevin CA 45 Mark MI 27 Sara FL 64

CA row #3 MI row #17 MI row #4 FL row #5

Speed - Part 1Indexing

Index Index Index

• index per file • customize by field(s)

File Name ~/customers_2010-05

File Name ~/customers_2010-05_index

1 40

Non-Indexed

1 200

To

Indexed

1 40

Non-Indexed

1 200

To

Indexed

male row #345 female row #4 male row #97 female row #267

CA row #3 MI row #17 MI row #4 FL row #5

Example Index Example Index

Speed - Part 2Roxie

K CA 45 M MI 27 S FL 64Roxie Master

Roxie Slaves

Index In-Memory

Index Index Index

Speed - Part 2Roxie

K CA 45 M MI 27 S FL 64Roxie Master

Roxie Slaves

Index In-Memory & Part or All Data

Index Index Index

orIndex In-Memory

Speed - Part 2Roxie

K CA 45 M MI 27 S FL 64Roxie Master

Roxie Slaves

Roxie is Multi-ThreadedIndex In-Memory & Part or All Data

orIndex In-Memory

Index Index Index

Speed - Part 2Roxie

K CA 45 M MI 27 S FL 64Roxie Master

Roxie Slaves

Roxie is Multi-ThreadedIndex In-Memory & Part or All Data

orIndex In-Memory

Index Index Index

SSD are OK - write few / read many

Speed - Part 2Roxie

K CA 45 M MI 27 S FL 64Roxie Master

Roxie Slaves

Roxie is Multi-ThreadedIndex In-Memory & Part or All Data

orIndex In-Memory

Index Index Index

2004

Thor Master

Thor Slaves

Dali ESP

Roxie Master

Roxie Slaves

Common Cluster

Data is a mix of structured and unstructured. Use Thor to do ETL and send results to Roxie for user queries.

HPCC Systems 5.2

New JSON file support

https://github.com/couchbase/sync_gateway/wiki/Webhooks

Flow Data !From: Sync Gateway !To: HPCC Systems

{“data”:”yes”}

Sync Gateway’s Webhooks API lets you catch every JSON coming into Sync Gateway

{“data”:”yes”} Couchbase Lite to !HPCC Systems !

Transport

A simple Python web server that can catch all the HTTP POST from Sync Gateway and writes it

to a file for HPCC Systems to store.

https://github.com/househippo

Couchbase Lite to HPCC Systems Transport

INSTALL!in 5 Minutes

Download

Source Code

Learning More - Couchbase Lite

http://couchbase.com/download

https://github.com/couchbase

Mountain View, CA San Francisco ,CA

http://developer.couchbase.com/mobile/get-started/get-started-

mobile/index.html

INSTALL!in 5 Minutes

Download

or

Source Codehttps://github.com/hpcc-systems

http://hpccsystems.com/download/

Learning More - HPCC Systems

Atlanta, GA Mountain View, CA

https://youtu.be/8SV43DCUqJg