+ All Categories
Home > Data & Analytics > Ets train ppt_big_data_basics_v2.0

Ets train ppt_big_data_basics_v2.0

Date post: 15-Apr-2017
Category:
Upload: eclipse-techno-consulting-global-p-ltd
View: 135 times
Download: 0 times
Share this document with a friend
15
Big Data Basics AUTHOR : MITHUN BANERJEE DATE: 05-OCTOBER-2016 COPYRIGHT PROTECTED BY ECLIPSE TECHNOCONSULTING GLOBAL (P) LTD.
Transcript
Page 1: Ets train ppt_big_data_basics_v2.0

Big Data Basics

AUTHOR : MITHUN BANERJEEDATE: 05-OCTOBER-2016

C O P Y R I G H T P R O T E C T E D B Y E C L I P S E T E C H N O C O N S U LT I N G G L O B A L ( P ) LT D .

Page 2: Ets train ppt_big_data_basics_v2.0

What is Big data?Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. --Wikipedia

Is the above definition fully comprehensive?

Lets try to go deep in next slides

Page 3: Ets train ppt_big_data_basics_v2.0

Data units to measure exponential growth of data over the years

VOLUME of DATA

Page 4: Ets train ppt_big_data_basics_v2.0

Type of data

• Relational Data (Tables/Transaction/Legacy Data)

• Text Data (Web)

• Semi-structured Data (XML)

• Graph DataSocial Network, Semantic Web (RDF), …

• Streaming Data You can only scan the data once

• A single application can be generating/collecting many types of data

• Big Public Data (online, weather, finance, etc)

Variety (complexities) of data

Page 5: Ets train ppt_big_data_basics_v2.0

Velocity of dataLate decisions missing opportunities

Example: Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction

Velocity of data

Social media and networks(all of us are generating data)Scientific instruments

(collecting all sorts of data)

Sensor technology and networks(measuring all kinds of data)

REAL TIME / FAST DATA

Page 6: Ets train ppt_big_data_basics_v2.0

3Vs

Page 7: Ets train ppt_big_data_basics_v2.0

4Vs

Page 8: Ets train ppt_big_data_basics_v2.0

Generation and Consumption of Data

In past

In present

OLTP: O N L I N E T RA N S AC T I O N P R O C E S S I N G ( D B M S )

OLAP: O N L I N E A N A LY T I C A L P R O CE S S I N G ( DATA WA R E H O U S I N G )

RTAP: R EA L-T IME ANA LY T IC S P R OC ES S I NG (B IG DATA ARC H I T EC T U R E & T E CH NOLOGY )

Page 9: Ets train ppt_big_data_basics_v2.0

Driver of Data

- Optimizations and predictive analytics- Complex statistical analysis- All types of data, and many sources- Very large datasets- More of a real-time

- Ad-hoc querying and reporting- Data mining techniques- Structured data, typical sources- Small to mid-size datasets

Page 10: Ets train ppt_big_data_basics_v2.0

The Evolution of Business Intelligence

BI ReportingOLAP &

Dataware houseBusiness Objects, SAS,

Informatica, Cognos other SQL Reporting

Tools

Interactive Business

Intelligence & In-memory

RDBMS

QliqView, Tableau, HANA

Big Data:Real Time &Single ViewGraph Databases

Big Data: Batch Processing

& Distributed Data

StoreHadoop/Spark;

HBase/Cassandra1990’s 2000’s 2010’s

Speed

Scale

Scale

Speed

Page 11: Ets train ppt_big_data_basics_v2.0

Topic 1: Data Analytics & Data Mining• EXPLORATORY DATA ANALYSIS• • L INEAR CLASSIF ICATION (PERCEPTRON &

LOGIST IC REGRESSION) • • L INEAR REGRESSION

• C4.5 DECIS ION TREE

• APRIORI

• K-MEANS CLUSTERING• • EM ALGORITHM

• PAGERANK & HITS

• COLLABORATIVE F ILTERING

Page 12: Ets train ppt_big_data_basics_v2.0

Topic 2: Hadoop/MapReduce Programming & Data Processing

ARCHITECTURE OF HADOOP, HDFS, AND YARNPROGRAMMING ON HADOOP

BASIC DATA PROCESSING: SORT AND JOININFORMATION RETRIEVAL USING HADOOPDATA MINING USING HADOOP (KMEANS+HISTOGRAMS)MACHINE LEARNING ON HADOOP (EM)

HIVE/PIGHBASE AND CASSANDRA

Page 13: Ets train ppt_big_data_basics_v2.0

Topic 3: Graph Database and Graph Analytics

GRAPH DATABASE (HTTP://EN.WIKIPEDIA.ORG/WIKI/GRAPH_DATABASE)

Native Graph Database (Neo4j) Pregel/Giraph (Distributed Graph Processing Engine)

NEO4J/TITAN/GRAPHLAB/GRAPHSQL

Page 14: Ets train ppt_big_data_basics_v2.0

Reference to read for in depth home work

• Hadoop: The Definitive Guide, Tom White, O’Reilly

• Data Mining: Concepts and Techniques, Third Edition, by Jiawei Han et al.

• https://www.mongodb.com/collateral/big-data-examples-and-guidelines-enterprise-decision-maker

• • http://

www.aptude.com/blog/entry/hadoop-vs-mongodb-which-platform-is-better-for-handling-big-data

• • http://

www.slideshare.net/wlaforest/an-introduction-to-big-data-nosql-and-mongodb

• http://www.infoworld.com/article/2608460/application-development/the-10-worst-big-data-practices.html

Page 15: Ets train ppt_big_data_basics_v2.0

THANK YOU ETS


Recommended