SAS adds a V Visualization Of course, the objective to get
Valuethe ultimate V 12 Vs has been reported 9
Slide 11
Definition of Big Data Big Data technologies describe a new
generation of technologies and architectures, designed to
economically extract value from very large volumes of a wide
variety of data, by enabling high velocity capture, discovery
and/or analysis -- IDC Data is the new oil European Consumer
Commissioner Meglena Keneva 10
Slide 12
11
Slide 13
The Human Face of Big Data 12
Slide 14
Digital Universe 13
Slide 15
Gartner Technology Hype Cycle 14
Slide 16
Gartner Technology Hype Cycle 15
Slide 17
Big Data Technologies 16
Slide 18
Hadoop/MapReduce Was driven by the need to index the web
Existing technology did not scale MapReduce framework developed at
Google Yahoo! built Hadoop on the Map/Reduce framework Note: recent
survey indicates only 16% of companies using the Hadoop/MapReduce
environment dominated by the online companie 17
Slide 19
Hadoop Is the Storage Layer (HDFS) Hadoop Distributed File
System - Software to distribute data across multiple computing
nodes. Typically runs on top of Linux Store each block 3
timeshopefully with one on a node in a different rack Sequential
access write once, read many Optimized for streaming no random
access No predefined schemaany data type 18
Slide 20
Hadoop (cont) The Execution Layer (Map/Reduce) Responsible of
running a batch job in parallel on many servers Typically runs on
top of Linux Works with (key, value) pairs For a job Mapper pulls
data from their respective files Mapper Feeds Shuffle (may not be
needed) Shuffle feeds Reducer which summarizes and returns result
Java is native language 19
Slide 21
Map Reduce Example Five files; each with two columns of key,
value pairs of city, max temperature Example: Toronto, 20 Whitby,
25 Problem: Find the maximum temperature for each city Break down
into 5 mapper tasks; results of mapper tasks are: (Toronto, 20)
(Whitby, 25) (New York, 22) (Rome, 33) (Toronto, 18) (Whitby, 27)
(New York, 32) (Rome, 37) (Toronto, 32) (Whitby, 20) (New York, 33)
(Rome, 38) (Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31)
(Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30) Mapper task
results feeds into reduce tasks which combine the input results and
outputs a single value for each city (Toronto, 32) (Whitby, 27)
(New York, 33) (Rome, 38) 20
HANA In-Memory Computing Speed Demo Backdrop Oracle World Demo
Put all of Wikipedia into Oracle 12c with in-memory option SAP Tech
Ed a few weeks later Put all of Wikipedia into SGI HANA box(250
billion rows) Query and Plot of Wikipedia page views of AIDS versus
Ebola by date Forecast of Wikipedia page views of AIDS versus Ebola
by dat e 24
Slide 26
25
Slide 27
26
Slide 28
RAM is so inexpensive, it is a no-brainer to move to in-memory
computing? In-memory computing is an expected evolution in the
digital universe? In-memory computing tenet: RAM is the new DISK
DISK is the new TAPE Myth or Reality 27
Slide 29
On-line Gambling Increasing number of online bets per second
from 20,000 to 150,000 (Bwin.Party) Education Near real-time
analytics driving intervention for improving retention (University
of Kentucky) Health Care Intersection of smart devices, electronic
health care records and in-memory analytics to provide real-time
diagnostics and treatment McKinsey & Company Global package
company Move to real-time tracking of packages MarketWatch Cases
28
Slide 30
Thoughts on In-Memory Computing In-Memory Computing makes Big
Data Possible Insight at the speed of thought IMDBMS reduces data
footprint Eliminates aggregates Compression for columns higher than
for rows Optimized for RAM instead of optimized for disk 29
Slide 31
Automated Decision-Making Mobile Computing Two Factors Will
Drive In-Memory Computing Faster than Planned 30
Slide 32
A Data Scientist 31
Slide 33
Another View of a Data Scientist 32
Slide 34
So How Do I Find One 33
Slide 35
Big Data is disruptive in the following ways It brings grid and
in-memory computing to business Software is being moved to the data
instead of moving the data to the software Transition from
analytics as rest to analytics in motion Will create new demand for
workers with analytics skills 34
Slide 36
Big Data is really about Analytics 35
Slide 37
A View of Analytics Source: mu-sigma 36
Slide 38
Source: Rose Business Technologies Another View of Analytics
37
Slide 39
Competitive Advantage Basic Reporting What happened? Ad Hoc
Reporting How many, how often, where? Dynamic ReportingWhere
exactly are the problems? Reporting with Early WarningWhat actions
are needed? Basic Statistical Analysis Why is this happening?
Forecasting What if these trends continue? Predictive Modeling What
will happen next? Decision OptimizationWhat is the best decision?
Data Information Intelligence Advanced Analytics Basic Analytics
Reporting Decision Support Decision Guidance Achieving Success with
Business Analytics Another View of Analytics 38
Slide 40
Another View of Analytics 39
Slide 41
Another View of Analytics 40
Slide 42
Cognitive Computing? Watson gains eyes, ears and a voice
41
Slide 43
The Importance of Big Data and Analytics Wall Street Journal
9/16/13 44% of CIOs consider Business Intelligence as top priority
for technology spending 51% of the companies plan to increase
spending on Business Intelligence and Analytics software this year
A recent McKinsey report Considers Big Data as The next frontier
for competition The United States alone faces a shortage of 140,000
to 190,000 people with analytical expertise and 1.5 million
managers and analysts with the skills to understand and make
decisions based on the analysis of big data. Do you need a Data
Scientist? 42
Slide 44
The Importance of Big Data and Analytics 43
Slide 45
Analytics, The New Path to Value (MIT research report: 30
industries, 100 countries) Analytics is the differentiator for the
top performing companies chart on next slide Data is not the
problem Data Driven Decisions 44
Slide 46
45
Slide 47
5 Stages of Big Data and Analytics Maturity 46
Slide 48
Current State 47
Slide 49
There is a Journal Big Data Journal Word (Tag) Cloud Word Cloud
with Images Easy Text Manipulation
http://www.ibm.com/analytics/watson-analytics/
https://ace.ng.bluemix.net/
http://www.biography.com/people/warren-buffett-9230729#synopsis
48
Slide 50
49
Slide 51
Of Interest Social Bakers Amazing Twitter Stats Google Trends
Social media location adds considerable opportunity 50
Slide 52
Implications 51
Slide 53
The Analytics At rest (static) Models including predictive
models using historical data In-motion (real-time) Using models on
a stream feed Combination Uses models on a stream feed; stream feed
goes into the data at rest to update models 52
Slide 54
Analytics at RestAnalytics in Motion 53
Slide 55
Thoughts It is not a matter of if but when you get into Big
Data analytics Purpose is to provide enablement for users Choices
Pure plays like Cloudera, Hortonworks, MapR, Pivotal, etc. NoSQL
databases (key-value, documents, networks) Major computing player
like IBM, Oracle, etc. In-Memory Computing Should not be a new silo
54
Slide 56
Terms IoT Internet of Things IoE Internet of Everything IoN
Internet of Nothing The vast majority of the billions of things
connected to the internet on Ciscos website, for instance, are not
the toasters, refrigerators, thermostats, smoke detectors,
pace-makers and insulin pumps that the IoT's true believers enthuse
about. Almost exclusively, they are existing smartphones, tablets,
computers and routers, plus a surprising number of industrial
components used to beam performance statistics back to corporate
headquarters. Without any hoopla, operators of power stations,
passenger jets, railways, refineries, chemical plants, oil
platforms and other industrial equipment have been doing this for
ages. 55
Slide 57
EMC Digital Universe with Research & Analysis by IDC The
Digital Universe of Opportunities: Rich Data and the Increasing
Value of the Internet of Things April 2014 40% of data created and
consumed by consumers 56