Post on 29-Nov-2014
description
transcript
@Kognitio #SparkEvent
Hadoop meets Mature BI: Where the rubber meets the road for
the Modern Data Platform
Michael HiskeyFuturist, Product Evangelist
(and VP, Marketing and Business Development
www.kognitio.com
@Kognitio #SparkEvent
Today, and the Future
Big DataAdvanced Analytics
In-memory
Modern Data Platform
Hybrid Data Ecosystem ‘Logical Data Warehouse’
Predictive Analytics
Data Scientists
Data
@Kognitio #SparkEvent
64% Have invested/plan to invest in Big Data Tech
Have started using it8%Via TechCrunch, 23 Sept 2013
Average TBs of stored data200Walmart DW in 19992x
Insights & Publications May, 2011
@Kognitio #SparkEvent
The Data ScientistSexiest job of the 21st Century?
@Kognitio #SparkEvent
Data Scientist
The Analytical Enterprise
Business Analyst
Systems Admin
@Kognitio #SparkEvent
Remember: Decision Support Systems?
…accessed with easeand simplicity
Historical information, latency
BI tools have plateaued
0 1 2 3 4 5 6 7 8 9
Advanced analytics & data science
More math…a lot more math
@Kognitio #SparkEvent
create externalscript LM_PRODUCT_FORECAST environment rsintreceives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_IDsends ( R_OUTPUT varchar )isolate partitionsscript S'endofr( # Simple R script to run a linear fit on daily sales
prod1<-read.csv(file=file("stdin"), header=FALSE,row.namescolnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES")dim1<-dim(prod1)daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW),daily1[,2]<-daily1[,2]/sum(daily1[,2])basesales<-array(0,c(dim1[1],2))basesales[,1]<-prod1$IDbasesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2])colnames(basesales)<-c("ID","BASESALES")fit1=lm(BASESALES ~ ID,as.data.frame(basesales))
Behind the numbers
@Kognitio #SparkEvent
What has changed?
More connected-users?
More-connected users?
@Kognitio #SparkEvent
Don’t be a Railroad Stoker!Highly skilled engineering required … but the world innovated around them.
@Kognitio #SparkEvent
Machine learning algorithms Dynamic
Simulation
Statistical Analysis
Clustering
Behaviormodelling
The drive for deeper understanding
Reporting & BPMFraud detection
Dynamic Interaction
Technology/Automation
Analytical Com
plexity
Campaign Management
@Kognitio #SparkEvent
Key: “Graduation”Projects will need
to Graduatefrom the
Data Science Lab and become part
of Business as Usual
@Kognitio #SparkEvent
Your goal:
PRESS HERE…and really cool Big Data stuff happens!
@Kognitio #SparkEvent
Data flow
@Kognitio #SparkEvent
© 20th Century Fox
@Kognitio #SparkEvent
No need to pre‐process No need to align to schema
No need to triage
Null storage concerns
@Kognitio #SparkEvent
Hadoop just too slow for interactive
BI!
…loss of train‐of‐thought
“while Hadoop shines as a processingplatform, it is painfully slow as a query tool”
@Kognitio #SparkEvent
Lots of these
Not so many of theseinherently disk oriented
typically low ratio of CPU to Disk
Hadoop is…
@Kognitio #SparkEvent
Analytics needslow latency, no I/O wait
High speed in‐memory processing
A*Modern Data Platform Reference Architecture
AnalyticalPlatform Near‐line
Storage(optional)
AccessApplication &Client Layer
All BI Tools All OLAP Clients Excel
PersistenceLayer
HadoopClusters
Enterprise DataWarehouses
LegacySystems
…
Reporting
Cloud Storage
*(not THE)
© Hortonworks Inc. 2013
(another) Next-Generation Data Architecture
Page 21
APPLICAT
IONS
DAT
A SYSTEM
S
Microsoft Applications
DAT
A SO
URC
ES
Traditional Sources (RDBMS, OLTP, OLAP)
In‐memory MPP Accelerator
BI Tools & OLAP Clients
TRADITIONAL REPOSRDBMS EDW MPP
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
New Sources (web logs, email, sensors, social media)
HORTONWORKS DATA PLATFORM
Analytical Platform