+ All Categories
Home > Technology > Demystifying big data

Demystifying big data

Date post: 12-Apr-2017
Category:
Upload: akash-mishra
View: 98 times
Download: 0 times
Share this document with a friend
26
Demystifying Big Data Brown Bag
Transcript
Page 1: Demystifying big data

Demystifying Big Data

Brown Bag

Page 2: Demystifying big data

Everything start small

Page 3: Demystifying big data

Traditional Approach

Page 4: Demystifying big data

Simple Process

Page 5: Demystifying big data

Result

Page 6: Demystifying big data

What’s next?Unanswered question of lifetime.

Page 7: Demystifying big data

Unquenchable thirst of improvement

❏ How to Sell more?

❏ How to optimize inventory?

❏ How to engage customer more?

❏ What do my customer Like?

❏ How to reduce Operation Cost?

Page 8: Demystifying big data

Torture the data, and it will confess to anythingRonald Coase

Page 9: Demystifying big data

How to get Data?Humans…..

Page 10: Demystifying big data

Ever Growing Data ❏ Historical data plays important role.

❏ Data explodes while processing.

❏ More data beats better algorithms.

Page 11: Demystifying big data

So What is Big Data?When data has tendency to grow more than what one machine can process.

Page 12: Demystifying big data

Getting Right Tool

Page 13: Demystifying big data

Data Parallel Processing❏ Distribute the data [ With replication]

❏ Move Computation close to Data

❏ Process each section of Data separately

❏ Aggregate the results.

Page 14: Demystifying big data

Advantages of Data Parallel Model

❏ No Hardware restriction. e.g Memory, CPU.

❏ No Scalability Issue

❏ Cost effectiveness.

❏ No Single point of failure.

Page 15: Demystifying big data

That’s nice, So problem solved. But Presentation says Hadoop,Spark?

Page 16: Demystifying big data

Challenges of Data-||-sim ❏ Data partitioning, distribution and accumulation

❏ Fault Tolerance.

❏ Distributed Coordination and management.

❏ Abstraction with the distributed complexity.

Page 17: Demystifying big data

Big Data Ecosystem ❏ Distributed Data Storage System:

❏ Data distribution.❏ Data Replication.❏ High throughput with no single point of failure.

❏ Distributed Data Processing System:❏ Distributing Code close to data.❏ Abstracting distributed complexity from programmer.❏ Fault tolerance and handling computation failure.❏ Aggregating results.

❏ Distributed Coordination and Resource management.❏ Resource allocation.❏ Distributed configuration management.

Page 18: Demystifying big data

Distributed Data Storage System

Page 19: Demystifying big data

Distributed Data Processing System

Page 20: Demystifying big data

Distributed Coordination and Resource management.

Page 21: Demystifying big data

Lambda Architecture

Page 22: Demystifying big data

How to Sell more?Recommendation.

Page 23: Demystifying big data

Speed Layer

2. Product Views

1. Web Log

3. Similar Product

4. Update user product recommendation

Page 24: Demystifying big data

How to optimize inventory?Predication

Page 25: Demystifying big data

Batch Layer

1. User Data

2. Location Cluster per item

3. Location Cluster per item Data

3. Current Warehouse inventory

4. Inventory transfer.


Recommended