OM DAYAL GROUP OF INSTITUTIONS SUBMITTED BY M SHYAM SUNDER ASHISH MISHRA
Transcript
1. OM DAYAL GROUP OF INSTITUTIONS SUBMITTED BY M SHYAM SUNDER
ASHISH MISHRA
2. A BRIEF OF TODAYS DISCUSSION What is Big Data. Origin. Now
or Never! The more the merrier! Messiness a positive feature, not a
shortcoming. Relationships are important. Datafication Quantifying
the World. Valuing the priceless. Dark side of Big Data. Taming the
Bull. The future starts today.
3. WHAT IS BIG DATA? The word data means given in Latin,
meaning a fact. Big data is data that is too large, complex and
dynamic for any conventional data tools to capture, store, manage
and analyze. The right use of Big Data allows analysts to spot
trends and gives niche insights that help create value and
innovation much faster than conventional methods. The four Vs which
drive Big Data are :- Big Data Volume Variety Veracity
Velocity
4. ORIGIN OF BIG DATA Increase in Storage capacity. The world's
technological per-capita capacity to store information has roughly
doubled every 40 months since the 1980s, while the costs have come
down as well. High processing speed is readily available. Gordon
Moore of the Co- Founder of Intel stated the number of transistors
on integrated circuits doubles approximately every two years as a
result of which the processing performance is doubling every 18
months. Data analysis generally uses Statistical models taking
samples which are erroneous and do not reveal the true picture all
the time. Big Data does not consider sample data it covers the
whole data set and hence provides a better analysis. Google Flu
Trend was a milestone event for Big Data Analysis.
5. PEOPLE BEHIND THE REVOLUTION! STORAGE Stuart Parkin, IBM
Fellow and manager of the magneto electronics group at the IBM
Almaden Research Center in San Jose, California. In April 2014,
Parkin was awarded the Millennium Technology Prize for his work on
spintronic materials, "leading to a prodigious growth in the
capacity to store digital information". PROCESSOR PERFORMANCE In
July 1968, Gordon Moore co- founded NM Electronics which later
became Intel Corporation with Robert Noyce. Moore was awarded the
2008 IEEE Medal of Honor for "pioneering technical roles in
integrated-circuit processing, and leadership in the development of
MOS memory, the microprocessor computer and the semiconductor
industry.
6. DATA GROWTH RATE One Zetta Byte(ZB) = 1000 Exa Bytes = 1
Billion Terra Byte (TB)
7. NOW OR NEVER! The Large Hadron Collider uses about 150
million sensors delivering data 40 million times/sec. There are 600
million collisions/sec. As a result, the data flow from all four
LHC experiments 25 petabytes annual rate. Big data analysis played
a large role in Barack Obama's successful 2012 re- election
campaign. Walmart handles more than 1 million customer transactions
every hour, which are imported into databases estimated to contain
more than 2.5 petabytes of data. Google processes over 24 Peta
Bytes of data per day. Snapchat users upload 16 million pictures
per hour. Facebook sees 10 million photos uploaded every hour, a
Like button is clicked or a comment posted nearly 3 billion times
per day.
8. NOW OR NEVER!....(CONTINUED) 800 million monthly users of
Youtube service upload over an hour of video every second. The
number of messages on Twitter grows at around 200% a year and in
2012 it exceeded 400 million tweets a day. From the Sciences to
healthcare, from Banking to Internet, the sectors maybe diverse yet
together they tell a similar story. The amount of data in the world
is growing fast, outstripping not just our machines but our
imaginations as well. If we dont process and analyse this huge
amount of Data, Now we would never be able to harness its true
potential. So its a question of Now or never.
9. THE MORE THE MERRIER The challenge of processing large piles
of data accurately has been with us for a while. For, most of
history we worked only with a little data because our tools to
collect, organise, store, and analyse large amounts of data were
poor. First statistical methods were used to crunch data which used
sample set of the total data, this was erroneous. Then came the
practice of picking up random samples within the available data set
which reduced the probability of error to up to 3%. Big Data works
on a sample space where N = All , i.e. it uses up all the available
data and hence we get more accurate results. As the data sets
become larger and we obtain access to greater amounts of data the
results would become more and more accurate. Hence More is
merrier.
10. THE MORE THE MERRIER! (CONTINUED) FareCast, a flight
reservation price predictor company, initially used 12,000 data
points as sample and predicted the ticket prices according to the
date of journey, it performed well. But as he went on adding more
data the quality of predictions improved significantly. Steve Jobs
added 4-5 years to his life by getting his whole DNA Genome
sequencing done and with the available information from other
patients DNA sequencing doctors could devise his treatment. Steve
jobs called it Jumping from one lily pad to another, he also added
Im either going to be one of the first to be able to outrun a
cancer like this or I am going to be one of the last to die from
it.
11. MESSINESS A POSITIVE FEATURE. Messiness refers to the
simple fact that the likelihood of errors increases as you add more
data points. It can also refer to the inconsistency of formatting,
for which the data needs to be cleaned before being processed. It
deals with information at Macro levels where scale is a huge
factor, hence we can accept some messiness. We can sacrifice a bit
of accuracy in return of knowing the general trend. Its application
in Natural Language processing (Google Translate). The Billion
Prices project by MIT Scientists and Analysts.
12. CORRELATIONS Finding the relationship between the available
data can give us greater insights into the behavior of the entity
generating the data. At its core correlation quantifies the
statistical relationship between data values. A strong correlation
means that when one of the data value changes the other is highly
likely to change as well. Correlation help us to capture the
present and predict the future. The ability to predict with a
certain likelihood is extremely valuable. Amazons Recommendation
System found correlation in its consumer purchases and their future
buys using machine learning. Walmarts innovative Sales practices
making it the Worlds largest retail chain.
13. AN EXPERT VIEW ON RELATIONAL INSIGHTS.
14. DATAFICATION Datafication refers to put data in a
quantified format so it can be tabulated and analyzed. Digitization
of all data in its various formats and collecting them in
structured formats helps us in Big Data analysis. Analyzing these
data gives us useful insights into the behavioral shifts of
customers. Videos and photographs when digitized can help us to
gain insights into the behavior patterns of the users. Google
Translation service digitized 95 Billions of lines available from
every possible book it could access and created a robust and freely
accessible database for searching.
15. DATA IS PRICELESS Value of Data is immense, the
preconception that existed was that once a data is used it loses
its value and is redundant but with Big Data analysis tools this
data can be reused and can be worth Billions of Dollars, as will be
evident from the next example. Amazons partnership with AOL to
improve its e commerce website. Facebooks behavioral analysis. The
concept of reCAPTCHA (Completely Automated Public Turing test to
tell Computers and Humans Apart) by Luis von Ahn. The commercial
valuations of a Company like Facebook, WhatsApp is solely on the
terms of Data it can acquire.
16. DATA IS PRICELESS(CONTINUED) By typing a reCAPTCHA image
you not only identify yourself as a human but you also decipher
optical images from The New York Times and other books on Google
Books. Unknowingly you are helping google to Digitise its library.
Saving Google around 1 Billion USD every year.
17. THE DARK SIDE OF BIG DATA It paralyzes the privacy of
online users and their data. With more and more people expressing
themselves on the internet, their data is subject to misuse. A
world dominated by Big Data can lead to a situation where Data will
act as a Dictator and real human insights might be compromised. As
Big Data only gives us the answer to What and not How the real
reasons behind the happenings might be wrongly interpreted which
can jeopardize the very purpose of analysis altogether. Companies
which are sitting on a treasure trove of data like Google and
Facebook can manipulate and monopolize the use of data and in the
process harm the general consumers.
18. TAMING THE BULL Technology is changing at an incredible
pace and the government and the netizens have not been able to
anticipate this and act accordingly. The government needs to act on
the changing cyber demographics and keep updating their Cyber Laws
which will be in sync with the prevalent technological loopholes.
Freedom of speech is a constitutional guarantee but this right
comes with a responsibility. People share information keeping in
mind nobody misuses it but in the age of Big Data the consent of
using the data for Big Data analysis lies with the Companies, by
changing the rules and empowering the user for consenting to
information sharing is a way in the right direction.
19. TAMING THE BULL..(CONTINUED) In every field be it Nuclear
Technology to bioengineering, we first build tools that we discover
can harm us and only later set out to devise the safety mechanisms
to protect us from those new tools. In case of big data as well
these issues need to be addressed. Our task is to appreciate the
hazards of this powerful technology, support its development and
seize its rewards.
20. THE FUTURE STARTS TODAY! Ecommerce and all Business domains
are on the verge of a big shift driven by big data and intelligent
technologies. This shift is towards a more efficient, personalized,
even automated customer journey. Emerging personalization tools are
designed to mimic the brain, leveraging neural networks and deep
learning.
21. THE FUTURE STARTS TODAY!..(CONTINUED) Video surveillance
can gain a much wider application with the addition of behavioral
analysis algorithms which can help retail stores to step up sales.
Data recorded from sensors can be analysed and used in systems like
Anti Theft car, floor pressure mapping systems. Addition of
Artificial Intelligence to Big Data analysis can not only answer
the What but also give answers to How things happen.
22. HOW DOES IT HELP THE STUDENTS! The last decade of IT
industry was mostly driven by Technology but this decade is
expected to rise on the back of Information in the form of Big
Data. Thus the demand of Data Analysts, Data Scientists is on a
rise. It is estimated around 4.4 Million Data Analysts would be
required by 2020. Skills required to be a Big Data
Professional
23. WANT TO EXPLORE ? Follow this youtube channel :-
https://www.youtube.com/user/ibmbigdata Read the Book :- Big Data:
A Revolution That Will Transform How We Live, Work, and Think ,
Author :- Viktor Mayer-Schnberger and Kenneth Cukier.
24. ANY QUESTIONS??
25. BIBLIOGRAPHY https://www.youtube.com/user/ibmbigdata Big
Data: A Revolution That Will Transform How We Live, Work, and Think
, Author :- Viktor Mayer-Schnberger and Kenneth Cukier.
www.google.com www.amazon.com www.en.wikipedia.org/Big_data
www.techcrunch.com
www.mckinsey.com/insights/big_data_the_next_frontier_for_innovation
Articles from IEEE Magazine. www.nytimes.com www.mit.edu And many
more.
26. A NOTE OF THANKS! We express our heartfelt gratitude to our
Faculty members for providing us with this opportunity to get into
a new subject and delve deeper into it. Thank you for your patience
and time. If you want to download this presentation follow the link
www.slideshare.net/ashishmishraoders/big-data-a-technological-
revolution