+ All Categories
Home > Documents > FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

Date post: 04-Jul-2015
Category:
Upload: geekslab
View: 69 times
Download: 4 times
Share this document with a friend
Description:
FOSS Sea 2014 (http://geekslab.co/events/21-foss-sea-2014-infrastructure-for-researchers) DataWarehouse & BigData _Владимир Слободянюк - Delivery Manager at Luxoft
17
www.luxoft.com DWH & Big Data Odessa Vladimir Slobodianiuk Date: 2014
Transcript
Page 1: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

DWH & Big Data

Odessa

Vladimir Slobodianiuk

Date: 2014

Page 2: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Agenda

1

2

Big Data – what is it

Hadoop vs RDBMS – pros and cons

3 Hadoop & Enterprise architecture

4 Hadoop as ETL engine

5 Case Studies

Page 3: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Big Data

– what is it

Page 4: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Current state

Big data - is an all-encompassing term for any collection of data sets so large and

complex that it becomes difficult to process using traditional data processing

applications.

Page 5: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Limitations & Problems

Big data is difficult to work with using

most relational databases, requiring

instead massively parallel software

running on tens, hundreds, or even

thousands of servers

eBay.com uses two data warehouses at 7.5 petabytes

Walmart handles more than 1 million customer

transactions every hour

Facebook handles 50 billion photos from its user base

In 2012, the Obama administration announced the Big

Data Research and Development Initiative

Page 6: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Hadoop vs RDBMS

Page 7: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

CORE HADOOP - MapReduce

In 2004, Google published a paper on a process called MapReduce

DISTRIBUTED

COMPUTING

FRAMEWORK

Process large jobs in

parallel across many

nodes and combine the

results

Page 8: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Hadoop Structure

HDFS is a distributed file system designed to run on commodity hardware

HBase store data rows in labelled tables (sortable key and an arbitrary number of columns)

Hive provide data summarization, query, and analysis (SQL-like interface)

Pig is a platform for analyzing large data sets that consists of a high-level language

Page 9: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Hadoop vs RDBMS

Hadoop RDBMS

Performance for relational data

Machine query optimization

Mature workload management

High concurrency interactive query

processing

How might this change in the future

Query Optimization Improvements in Hive

– Statistics, better join ordering, more join types, etc

Startup Time Improvements

– Simpler query plans to pass out

Runtime Performance Improvements

Schema-less Model

Human query optimization

Ability to create complex dataflow

with multiple inputs and outputs

Parallelize many Analytic Functions

Page 10: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Hadoop &

Enterprise architecture

Page 11: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Classic architecture approach

Page 12: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Hadoop & Enterprise architecture

Page 13: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Case Study 1

Hadoop as ETL Data Quality tool

BENEFITS

Reduced TCO (commodity hardware usage)

Traceability of all the data quality issues

Hadoop becomes clean data tool.

PROBLEM

Traditional tools show poor performance in exception

and data cleansing.

SOLUTION

Hadoop transforms the data into single format and

processes it using data cleansing workflows.

Page 14: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Case Study 2

Know Your Customer PoC

Business Challenge

• Knowing the actual customerreaction to products is essentialfor business growth, but it’sdifficult to get valuable insights.Social media is the place wherecustomer really share theiropinion

SOLUTION

Hadoop-based analysis tool that provides the ability to:

• Find the events in the clientstreams, identify neededreaction

• Propose a product to a client,based on his interests

Page 15: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Case Study 3

Enterprise ETL & Hadoop Integration

Goals:

MapReduce ETL jobs development

without coding

Build, re-use, and check impact analysis

with enhanced metadata capabilities

A windows-based graphical development

environment

Comprehensive built-in transformations

A library of Use Case Accelerators to

fast-track Hadoop productivity

Page 16: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

Big Data:

Cutting edge of DI technologies

State-of-the-art design approaches

A bit more than simple development, it's some of art, art

of data management

Summary

Page 17: FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

www.luxoft.com

THANK YOU


Recommended