Date post: | 16-Aug-2015 |
Category: |
Data & Analytics |
Upload: | sourabhdattawad |
View: | 183 times |
Download: | 0 times |
BIG DATA
Sourabh Dattawad Department of Computer ScienceKLS Gogte Institute of Technology
Belgaum, India
Contents
• Introduction• What is Big Data?• Characteristics of Big Data• What is Big Data analytics?• How does Big Data work?• Application of Big Data• Big Data growth• What’s trending• Conclusion• References
Introduction• A decade ago amount of data produced was less.• Today the amount of data in the world is increasing
rapidly, outstripping not only our machines, but also our imagination.
What can be done with this data?
• Scrapping this data is not a great idea. • Big data has the potential to help companies improve
operations and make faster, more intelligent and accurate decisions.
• More accurate analyses will lead to more confident and effective decision making. And better decisions can mean cost reductions and reduced risk.
Definition
• Big Data is a new term given to a diverse field of data analysis in which the datasets are so massive that they become hard to store, work, predict and analyze using traditional databases and software.
Characteristics of Big Data• Big Data is characterized as follows,
Volume
• It is the quantity of data generated that determines the value and potential of data .
• Facebook, gets more than 12 million photos every hour .
• Tweets on twitter cross over 400 million every day.
Velocity
• Its states the rate at which data is generated. • Every minute on YouTube 48 hours of new videos are
uploaded.• Every minute Google processes 2 million search
queries.
Variety
• It is the category to which the data belongs.• The categories include Health sectors, Social
networking, Banking etc.
What is Big Data analytics?
• Analyzing the large data and reaching to conclusions is called as Big Data analytics .
• Explanation using real life incidents,– Google’s Flu Trends.– Target Retailer.
Google’s Flu Trends
• Here Google predicted the flu trends just by analyzing the data.
• In the year 2009 a new flu virus ‘H1N1’ was discovered. • 250-500k deaths every year, worldwide.• Swine flu pandemic is worse.• Surveillance Centers for Disease Control and Prevention (CDC).
Problems Faced by CDC,– Weekly– 1-2 week publication lag
• Google took 50 million common search terms that was typed in United States and compared the number with CDC data on the spread of the flu.
• They processed 450 million different models in order to test the search terms and prediction was almost similar the stats processed by CDC .
What did they do?
Target Retailer
• Target retailer predicted the pregnancy just by analyzing the buy trends of the consumers.
• Story of a pregnant teenager.• This shows that real time data is never false.
How Big Data Works?
• Apache Hadoop -Apache Hadoop is the software most commonly associated with Big Data. Apache states it as “a framework that allows us for the distributed processing of massive data sets across clusters of computers using simple programming models”.
• With Hadoop, no data is too big. It is possible to process a huge data in just 3 minutes which takes more than 20 hours for traditional systems.
• MapReduce - To make effective splitting of data MapReduce is used. It is a software framework that allows primary to split the input data set into independent chunks that are processed in a completely parallel manner.
Simple Block Diagram
Applications of Big Data
Big Data Growth
What’s trending
• By analyzing the Big Data of DNA it is possible cure genetic diseases like cancer.
• This can even predict where terrorists try to attack only by analyzing the data.
Conclusion
• Big Data is the next big thing. Its about letting data speak and real time data is never false, hence it is a revolution that will transform how we think, live and work.
References
• Victor Mayer-Schonberger, Kenneth Cukier “Big Data – A Revolution”.
• Doing Data Science, By Cathy O'Neil, Rachel Schutt Publisher: O'Reilly Media.
• http://hadoop.apache.org
Thank You