+ All Categories
Home > Education > Big data

Big data

Date post: 18-Feb-2017
Category:
Upload: r-prasad
View: 121 times
Download: 0 times
Share this document with a friend
20
BIG DATA Presented By, R.S.M.N.PRASAD. (pvpsit)
Transcript
Page 1: Big data

BIG DATA

Presented By, R.S.M.N.PRASAD.

(pvpsit)

Page 2: Big data

OUTLOOK

Introduction Hadoop MapReduce Hyper Table Advantages

Page 3: Big data

BIG DATA

• The data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is called Big Data.

• Every day, we create 2.5 quintillion bytes (one quintillion bytes = one billion gigabytes). Of all data, so much of 90% of the data in the world today has been created in the last two years alone.

Page 4: Big data

IN FACT, IN A MINUTE…

• Email users send more than 204 million messages;

• Mobile Web receives 217 new users;

• Google receives over 2 million search queries;

• YouTube users upload 48 hours of new video;

• Facebook users share 684,000 bits of content;

• Twitter users send more than 100,000 tweets;

• Consumers spend $272,000 on Web shopping;

• Apple receives around 47,000 application downloads;

• Brands receive more than 34,000 Facebook 'likes';

• Tumblr blog owners publish 27,000 new posts;

• Instagram users share 3,600 new photos;

• Flickr users , on the other hand , add 3,125 new photos;

• Foursquare users perform 2,000 check-ins;

• WordPress users publish close to 350 new blog posts.

Page 5: Big data

Big Data Vectors• High-volume:

Amount of data

• High-velocity:Speed rate in collecting or acquiring or

generating or processing of data

• High-variety:Different data type such as audio, video, image data

Big Data = Transactions + Interactions + Observations

Page 6: Big data

What is Hadoop?

• HADOOP High-availability distributed object-oriented platform or “Hadoop” is a software framework which analyze structured and unstructured data and distribute applications on different servers.

• Basic Application of Hadoop Hadoop is used in maintaining, scaling, error handling, self healing and securing large scale of data. These data can be structured or unstructured. What I mean to say is if data is large then traditional systems are unable to handle it. 

Page 7: Big data

HADOOP

Page 8: Big data

DIFFERENT COMPONENTS ARE..........

Data Access Components :- PIG & HIVE

Data Storage Components :- HBASE

Data Integration Components :- APACHEFLUME ,SQOOP, CHUKWA.

Data Management Components :- AMBARI , ZOOKEEPER.

Data Serialization Components :- THRIFT & AVRO

Data Intelligence Components :- APACHE MAHOUT, DRILL

Page 9: Big data

What does it do?• Hadoop implements Google’s MapReduce, using

HDFS• MapReduce divides applications into many small

blocks of work. • HDFS creates multiple replicas of data blocks for

reliability, placing them on compute nodes around the cluster.

• MapReduce can then process the data where it is located.

• Hadoop ‘s target is to run on clusters of the order of 10,000-nodes.

Page 10: Big data

How does MapReduce work?

• The run time partitions the input and provides it to different Map instances;

• Map (key, value) (key’, value’)• The run time collects the (key’, value’) pairs and

distributes them to several Reduce functions so that each Reduce function gets the pairs with the same key’.

• Each Reduce produces a single (or zero) file output.

• Map and Reduce are user written functions.

Page 11: Big data

HYPERTABLE

What is it?• Open source Big table clone• Manages massive sparse tables with timestamped cell

versions• Single primary key index

What is it not? • No joins• No secondary indexes (not yet)• No transactions (not yet)

Page 12: Big data

SCALING

Page 13: Big data

TABLE: VISUAL REPRESENTATION

Page 14: Big data

TABLE: ACTUAL REPRESENTATION

Page 15: Big data

SYSTEM OVERVIEW

Page 16: Big data

RANGE SERVER

• Manages ranges of table data

• Caches updates in memory (Cell Cache)

• Periodically spills (compacts) cached updates to disk (CellStore)

Page 17: Big data

PERFORMANCE OPTIMIZATIONS

Block Cache• Caches CellStore blocks• Blocks are cached uncompressed

Bloom Filter• Avoids unnecessary disk access• Filter by rows or rows + columns• Configurable false positive rate

Access Groups• Physically store co-accessed columns together • Improves performance by minimizing I/O

Page 18: Big data

ADVANTAGES

• Flexible : Easily to access Structured & Unstructured Data

• Scalable: It can store & distributed very large data , sets 100’s of inexpensive Servers that Operate in Parallel.

• Efficient: By distributing the data, it can process it in parallel on the nodes where the data is located.

• Resistant to Failure: It automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Page 19: Big data

QUERIES????

Page 20: Big data

Recommended