Date post: | 14-Jul-2015 |
Category: |
Technology |
Upload: | inside-analysis |
View: | 151 times |
Download: | 0 times |
Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
The Briefing Room
Data Wrangling and the Art of Big Data Discovery
Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise software, good and bad
Provide a forum for detailed analysis of today’s innovative technologies
Give vendors a chance to explain their product to savvy analysts
Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
Should I Bring My Tools?
Ø Hammers aren’t good for plumbing!
Ø Big Data requires a new set of tools
Ø Preparing and Exploring are very different
Ø Don’t throw out your old tool box!
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
[email protected] @robinbloor
Twitter Tag: #briefr The Briefing Room
Trifacta and Zoomdata
Trifacta offers a platform for data transformation and preparation
The interface is rich in visualization and provides previews and recommendations
The platform also includes a learning layer which employs machine learning algorithms to facilitate automation and self-learning
Zoomdata is a Big Data exploration, visualization and analytics platform
The platform offers a wide range of analytics and BI tools, such as dashboards, stream processing and IoT analytics
Its pre-built connectors allow the Zoomdata server to connect directly to data sources
Twitter Tag: #briefr The Briefing Room
Guests:
Russ Cosentino is Vice President of Marketing & Business Development at Zoomdata. Throughout his career he has focused on developing solutions that leverage technology to solve business problems. His experience includes application development for mission critical systems for the DoD, automated recruitment programs for the intelligence community and the application of text analytics for commercial VOC programs.
Dr. Joe Hellerstein is Trifacta’s Chief Strategy Officer and a Professor of Computer Science at Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies “most likely to change our world.”
Data Wrangling and the Art of Big Data Discovery
Dr. Joe Hellerstein Professor, EECS Computer Science Division, UC Berkeley Co-founder & Chief Strategy Officer, Trifacta
DATA WRANGLING AND THE ART OF BIG DATA DISCOVERY
Russ Cosentino Vice President Marketing & Business Development, Zoomdata
Founded in 2012, from Berkeley/Stanford research roots
dp = data to the people “facilitating interactions between people and data
throughout the analytic lifecycle”
Stanford Visualization Group’s “Data Wrangler” Elegant solutions for a messy world: The 80% problem of preparing data for exploratory analytics
TRADITIONAL APPROACH TO DATA MANAGEMENT
Enterprise Data Warehouse
Implement Data Sources
ETL
Structured
Ingest
Storage #1, 2, N
ELT
Store & Process
EDW
Archive
ETL
Access Data
Analyze Data
Search
Statistical
Machine Learning
SQL
Serve
Serve
Op
timiz
e
Implement
Custom Application
Point Solution
ELT
ELT
MANY PEOPLE INVOLVED IN THE PROCESS
DATA ARCHITECT
DATABASE ADMINISTRATOR
SYSTEM ADMINISTRATOR
BUSINESS ANALYST
BI ADMINISTRATOR
SYSTEM ADMINISTRATOR
IT COULD BE SIMPLER
DATABASE ADMINISTRATOR BUSINESS ANALYST
MODERN DATA AND VISUALIZATION ENVIRONMENT
Visualiza8on Data Sources
Structured
Ingest
Store & Process Data Prepara8on
Serve Unstructured
Ingest
Serve
REAL BENEFITS OF A SELF-SERVICE APPROACH
+15% Cash Increase
+26% Pipeline Growth
-67% Cost Reduction
Real-Time
+15% Cash Increase
+26% Pipeline Growth
-67% Cost Reduction
+48% Speed of Delivery
+42% Self-Service Access
+40% Decision Quality
Real-Time Big Data
REAL BENEFITS OF A SELF-SERVICE APPROACH
+15% Cash Increase
+26% Pipeline Growth
-67% Cost Reduction
+70% Collaboration
+64% Decision Speed
+61% User Adoption
+48% Speed of Delivery
+42% Self-Service Access
+40% Decision Quality
Real-Time Interactive Big Data
REAL BENEFITS OF A SELF-SERVICE APPROACH
DEMONSTRATION
MODERN DATA ARCHITECTURE FOR SELF-SERVICE INTELLIGENCE
Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst: Robin Bloor
I am not anumber!
To Round-Up & Wrangle
Robin Bloor, PhD
The Flow of Data
The movement of data:
from ACQUISITION through PREPARATION
to ANALYSIS
Is not necessarily simple…
The General Picture Data Sources
Analytics
ServiceMgt
Life CycleMgt
MetaDataDiscovery
MDM
MetaDataMgt
DataCleansing
DataLineage
ROUND|UP
WRANGLING
Staging Area(Hadoop)
Data Warehouseor other location
Data Streams
ETL
ETL
Immediate Analytics & the Rest
§ Metadata discovery
§ Metadata management
§ Data cleansing
§ Data lineage
IMMEDIATE ANALYTICS Data Sources
Analytics
ServiceMgt
Life CycleMgt
MetaDataDiscovery
MDM
MetaDataMgt
DataCleansing
DataLineage
ROUND|UP
WRANGLING
Staging Area(Hadoop)
Data Warehouseor other location
Data Streams
ETL
ETL
§ MDM
§ Service mgt
§ Lifecycle mgt
§ ETL
DOWNSTREAM
The Analytics Business Process
§ The main point to note is that it is iterative
§ It has morphed, because of:
o Data availability
o Parallel technology
o Scalable software
o Open source tools
o M/C learning
DataAccess
DataPrep
Model
Analyze
Deploy
Execute
Analytical Latencies
1. Data access
2. Data preparation
3. Model development
4. Execution
5. Implementation
6. Model audit & update
This is where the rubber meets the road: Speed = Value
The Impending Reality
Technology is speeding up analytics by TWO ORDERS OF MAGNITUDE
(on the IT side)
This is changing analytics
u Is your capability only relevant to analytics or does it have broader areas of application?
u Technically, what makes it fast?
u Please comment on analytical workloads: - What do you see as the natural IT bottlenecks? - What do you see as the natural business bottlenecks?
u Do we want business analysts to become ersatz data scientists?
u In respect to scale, what is your largest implementation by data volume, and what was the industry sector/problem space?
u Who do you partner with?
u What do you see as the largest barrier to adoption of Trifacta?
Twitter Tag: #briefr The Briefing Room
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
March: BI/ANALYTICS
April: BIG DATA
May: CLOUD
Twitter Tag: #briefr The Briefing Room
THANK YOU for your
ATTENTION! Some images provided courtesy of
Wikimedia Commons and Wikipedia, including: "Multiple pliers" by Ed Stevenhagen from nl. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Multiple_pliers.jpg#mediaviewer/File:Multiple_pliers.jpg