Date post: | 15-Jul-2015 |
Category: |
Technology |
Upload: | skillspeed |
View: | 139 times |
Download: | 1 times |
Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce and Pig Comparison
Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
ᗍ Introduction to Big Data and Hadoop
ᗍ Understanding MapReduce and Pig Latin
ᗍ Comparative Analysis of MapReduce & Pig
ᗍ BIG Data & Hadoop Course Syllabus
ᗍ Webinar by Skillspeed
This session will help you with the following:
Get Started with BIG Data & Hadoop
Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Get Started with BIG Data & Hadoop
Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information
It’s very difficult to manage such huge data……
Get Started with BIG Data & Hadoop
Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop
Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop can be used for easy processing of such huge Data…..
We will answer how?
Before that let’s understand what is Hadoop?Get Started with BIG Data & Hadoop
Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop Characteristics
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop
Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS(Hadoop Distributed File System)
Pig LatinData Analysis
HiveDW System
MapReduce Framework HBase
OtherYARN
Frameworks (MPI,GIRAPH)
YARNCluster Resource Management
Get Started with BIG Data & Hadoop
Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Use Cases
Problem Statement:
Find maximum stock market levels recorded in a span of 5 years
Problem Statement:
De-identify personal identifier information
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Traditional Solution
matchesSplit Data
Allmatches
grep
grep
grep
cat
grep
:
matches
matches
matches
Split Data
Split Data
Split Data
VeryBig
Data
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Solution
Split Data
Allmatches
:
Split Data
Split Data
Split Data
MAP
REDUCE
MapReduce Framework
VeryBig
Input
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Understanding MapReduce Paradigm
Input Splitting Mapping Shuffling Reducing Final Result
List(K3,V3)Jack Bill Joe
Bill, 2Don, 3Jack, 2Joe, 2
K2,List(V2)List(K2,V2)K1,V1
Don Don Joe
Jack Car Bill
Bill, (1,1)
Don, (1,1,1)
Jack, (1,1)
Joe, (1,1)
MapReduce Word Count Process Flow
Jack Bill JoeDon Don JoeJack Don Bill
Jack, 1Bill, 1Joe, 1
Don, 1Don, 1Joe, 1
Jack, 1Don, 1Bill, 1
Bill, 2
Don, 3
Jack, 2
Joe, 2Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Anatomy
Key Value
Map:
Reduce:
(K1, V1) List (K2, V2)
(K2, list (V2)) List (K3, V3)
MapReduce
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Advantages
a b
c
Map Task
HDFS Block
Data Center
Rack
Node
The two biggest advantages of Map Reduce are:
ᗍ It takes processing to the data
ᗍ It allows processing of data in parallel
Get Started with BIG Data & Hadoop
© 2015 BlueCamphor Technologies (P) Ltd. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Input Splits in MapReduce
Input Data
HDFS Block
Input Splits
Physical Division
LogicalDivision
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Demonstration
Sequence Files Processing
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Need for Pig
Java is not a preferred language for many data analysts
200 Java LOC ~ 10 Pig LOC Many built-in operations are available for common data
operations like join, grouping, filtering etc.
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Need for Pig
ᗍ Useful for creating ad-hoc Map Reduce jobs on very large data sets
ᗍ Java knowledge is optional
ᗍ Very less development time
ᗍ Fewer LOC = Easier Maintenance
ᗍ Easily extensible whenever required
ᗍ Easy to Learn and user friendly
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Pig Vs M/R
0
20
40
60
80
100
120
140
160
180
Hadoop Pig
1/20 the lines of Code
0
50
100
150
200
250
300
Hadoop Pig
Min
ute
s
1/16 the development time
Min
ute
s
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce
ᗍ Provides powerful mechanism for parallel computation
ᗍ Gives more control on algorithm execution
ᗍ Very rigid in structure
Pig
ᗍ Acts as higher level DSL over Map Reduce
ᗍ Insulates programmers from underlying Hadoop concepts
ᗍ Provides seamless integration with a range of underlying Hadoop versions
Pig Vs M/R
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Where to use Pig?
Pig is a Data Flow language, thus it is most suitable for:
ᗍ Quickly changing data processing requirements
ᗍ Processing data from multiple channels
ᗍ Quick hypothesis testing
ᗍ Time sensitive data refreshes
ᗍ Data profiling using sampling
Get Started with BIG Data & Hadoop
© 2015 Blue Camphor Technologies (P) Ltd. Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Pig might NOT be a preferred choice when:
ᗍ Input data format is really nasty (video, audio, free formatted text etc)
ᗍ We need more fine grained control on processing
ᗍ Pig lacks control structures, so more looping and complex logic might need to extend Pig quite often
ᗍ There is always a baggage of extra processing in Pig on the top of Map Reduce logic, so Pig jobs are going to be a tad slower as compared to equivalent Map Reduce jobs
Where NOT to use Pig?
Get Started with BIG Data & Hadoop
Slide 24© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Expected?
In this section, we will discuss the questions on HDFS and MapReduce that is asked during the interview
This will help you analyze the importance of the topics under study!
Get Started with BIG Data & Hadoop
Slide 25© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is the use of Namenode in HDFS?
What is DataNode in HDFS?
What is Job Tracker in HDFS?
What is MapReduce?
How does an Hadoop application look like on their basic components?
And many more…………….
The Top 5 Interview Questions
Get Started with BIG Data & Hadoop
Slide 26© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Get Started with BIG Data & Hadoop
Slide 27© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course Curriculum
from Industry Experts
Instructor Led Live Virtual
Sessions
Lifetime access to Course
Content via LMS
100% Placement Assistance
24x7 Support
Get Started with BIG Data & Hadoop
Slide 28© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big Data and Hadoop
Module 2
HDFS Internals, Hadoop Configurations and
Data Loading
Module 3
Introduction to Map Reduce
Module 4
Advanced Map Reduce Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and Introduction to Hive
Module 7
Advanced Hive Concepts
Module 8
Extending Hive and HBase Introduction
Module 9
Advanced HBase and Oozie Introduction
Module 10
Project Set-up Discussion
Get Started with BIG Data & Hadoop
Slide 29© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Get Started with BIG Data & Hadoop
Slide 30© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
Contact us..
Get Started with BIG Data & Hadoop
Slide 31© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots
http://iconizer.net/en/search/1/collection:Practika
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010