Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | miles-conley |
View: | 229 times |
Download: | 3 times |
Big Data…Big Opportunities ?……Big Hype ?(or just a Big Mess ?)
Data challenges and IBM views
Dr. Matthew GanisIBM Senior Technical Staff Member
CIO Social Media Analytics Chief ArchitectMember, IBM Academy of Technology
[email protected]@mattganis (twitter)
The Term “Big Data” is pervasive - but still provokes a bit of confusion.
SO what is it ?
Big Data has been used to convey all sorts of concepts, including huge Quantities of data, social media analytics, next generation data managementCapabilities, real time data and much much more.....
Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible.
2009
800,000 petabytes
2020
35 zettabytesas much Data and ContentOver Coming Decade
44xBusiness leaders frequently make decisions based on information they don’t trust, or don’t have1 in 3
83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness
Business leaders say they don’t have access to the information they need to do their jobs
1 in 2
of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions
60%
… And Organizations Need Deeper Insights
Of world’s datais unstructured
80%
Information is at the Center of a New Wave of Opportunity…
5
Structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite.
The lack of structure makes compilation a time and energy-consuming task.
Structured vs Unstructured
The Challenge: Bring Together a Large Volume and Variety of Data to Find New Insights
Identify criminals and threats from disparate video, audio, and data feeds
Make risk decisions based on real-time transactional data
Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement
Detect life-threatening conditions at hospitals in time to intervene
Multi-channel customer sentiment and experience a analysis
7
Merging the Traditional and Big Data Approaches
IT
Structures the data to answer that question
IT
Delivers a platform to enable creative discovery
Business Users
Explores what questions could be asked
Business Users
Determine what question to ask
Monthly sales reportsProfitability analysisCustomer surveys
Brand sentimentProduct strategyMaximum asset utilization
Big Data ApproachIterative & Exploratory Analysis
Traditional ApproachStructured & Repeatable Analysis
9
Structured vs. Exploratory
The Internet of Things (IoT) is a scenario in which objects, animals or people are provided with unique identifies and the ability to automatically transfer data over a network without requiring human-to-human or human-to-computer interaction
What are we running ?
Who is talking about us ?Male / Female / Student / Professional / Retired / Customers ?
What do they “feel” ?Positive/Negative Sentiment / Angry / Annoyed ?
Where are they talking ?
Who are they influencing ?Who’s listening to them ?
When customers are talking about us or about our products we want to know where those conversations are happening so we can:
•Interact with interested customers•Get in front of any issues
Numerous studies show that word-of-mouth and personal recommendations are seen as far more credible to consumers than newspaper and television advertisements. While such mass advertisements are still necessary because of their powerful reach, these findings show that companies need to increase their focus on more personalized approaches. Clearly, this is incredibly difficult, maybe even impossible, for most companies to deal directly with the countless number of potential consumers. This is where influencers come in……
What makes someone Influential ?
The number of tweets they make ? The number of times people mention them ?
The number of followers they have?How often they are retweeted ?
We were asked to look at why a particular product launch wasn’t performing as expected. We pulled all the “chatter” about it and found:
Where is all this data coming from ?
While it is true that vast amounts of data are and will be generated from financial transactions, medical records, mobile phones and social media to the Internet of Things but there are questions that need to be asked to understand data’s meaningful use:
• How will data be managed?• How will data be shared?
Some thoughts about “data as a service”
•Establishment of standards, governance, guidelines. (E.g., open architectures)•Creation of industry specific data exchanges. (E.g., healthcare data exchanges, environment data exchanges etc.)•Creation of cross-industry data exchanges. (E.g., healthcare data exchanges seamlessly interacting with environmental data exchanges etc.)
Enterprise Integration
Trusted Information & Governance
– Companies need to govern what comes in, and the insights that come out
Data Management– Insights from Big Data must
be incorporated into the warehouse
Big Data PlatformData Warehouse
Enterprise Integration
Traditional Sources New Sources
34
Data variety - trying to accommodate data that comes from different sources and in a variety of different forms (images, geo data, text, social, numeric, etc.).
How do we link them together ?Is there a common taxonomy or why to organize it ?Is there a “signal” in one source of data that points to another ?
Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible.
The Big Data Opportunity
Manage the complexity of multiple relational and non-relational data types and schemas
Streaming data and large volume data movement
Scale from terabytes to zettabytes (1B TBs)
Variety:
Velocity:
Volume:
41
Big Data : why is it possible Now ?
Traditional approach : Data to Function
Big Data approach : Function to Data
Database server
Data
Query Data
return Data
process Data
Master node
Data nodes
Data
Application server
User request
Send result
User request
Send Function to process on Data
Query & process Data
Data nodes
Data
Data nodes
Data
Data nodes
DataSend Consolidate result
Traditional approachApplication server and Database server are separateData can be on multiple serversAnalysis Program can run on multiple Application serversNetwork is still a the middleData have to go through the network
•Big Data Approach Analysis Program runs where are the data : on Data NodeOnly the Analysis Program are have to go through the networkAnalysis Program need to be MapReduce awareHighly Scalable :
1000s NodesPetabytes and more
42