+ All Categories
Home > Technology > Kognitio spark modern data platform print

Kognitio spark modern data platform print

Date post: 05-Dec-2014
Category:
Upload: michael-hiskey
View: 376 times
Download: 1 times
Share this document with a friend
Description:
 
22
@Kognitio #SparkEvent Hadoop meets Mature BI: Where the rubber meets the road for the Modern Data Platform Michael Hiskey Futurist, Product Evangelist (and VP, Marketing & Business Development)
Transcript
Page 1: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Hadoop meets Mature BI: Where the rubber meets the road for 

the Modern Data Platform

Michael HiskeyFuturist, Product Evangelist

(and VP, Marketing & Business Development)

Page 2: Kognitio spark modern data platform print
Page 3: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Today, and the Future

Big DataAdvanced Analytics

In-memory

Modern Data Platform

Hybrid Data Ecosystem ‘Logical Data Warehouse’

Predictive Analytics

Data Scientists

Data

Page 4: Kognitio spark modern data platform print

@Kognitio #SparkEvent

The Data ScientistSexiest job of the 21st Century?

Page 5: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Data Scientist

The Analytical Enterprise

Business Analyst

Systems Admin

Page 6: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Remember: Decision Support Systems?

…accessed with easeand simplicity

Historical information, latency

BI tools have plateaued

0 1 2 3 4 5 6 7 8 9

Advanced analytics & data science

More math…a lot more math

Page 7: Kognitio spark modern data platform print

select Trans_Year, Num_Trans,count(distinct Account_ID) Num_Accts,sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trans) Total_Accts,cast(sum(total_spend)/1000 as int) Total_Spend,cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend,rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rank_by_Num_Accts,rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Total_Spendfrom( select Account_ID,

Extract(Year from Effective_Date) Trans_Year,count(Transaction_ID) Num_Trans,sum(Transaction_Amount) Total_Spend,avg(Transaction_Amount) Avg_Spend

from Transaction_factwhere extract(year from Effective_Date)<2009and Trans_Type='D' and Account_ID<>9025011and actionid in (select actionid from DEMO_FS.V_FIN_actions

where actionoriginid =1)group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summary

group by Trans_Year, Num_Transorder by Trans Year desc Num Trans;

Behind the numbers

Page 8: Kognitio spark modern data platform print

@Kognitio #SparkEvent

What has changed?

More connected-users?

More-connected users?

Page 9: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Don’t be a Railroad Stoker!Highly skilled engineering required … but the world innovated around them.

Page 10: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Machine learning algorithms Dynamic

Simulation

Statistical Analysis

Clustering

Behaviormodelling

The drive for deeper understanding

Reporting & BPMFraud detection

Dynamic Interaction

Technology/Automation

Analytical Com

plexity

Campaign Management

Page 11: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Key: “Graduation”Projects will need 

to Graduatefrom the 

Data Science Lab and become part 

of Business as Usual

Page 12: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Your goal: 

PRESS HERE…and really cool Big Data stuff happens!

Page 13: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Data flow

Page 14: Kognitio spark modern data platform print

@Kognitio #SparkEvent

© 20th Century Fox

Page 15: Kognitio spark modern data platform print

@Kognitio #SparkEvent

No need to pre‐process No need to align to schema

No need to triage 

Null storage concerns

Page 16: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Hadoop just too slow for interactive 

BI!

…loss of train‐of‐thought

“while Hadoop shines as a processingplatform, it is painfully slow as a query tool”

Page 17: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Lots of these

Not so many of theseinherently disk oriented

typically low ratio of CPU to Disk

Hadoop is… 

Page 18: Kognitio spark modern data platform print

@Kognitio #SparkEvent

Analytics needslow latency, no I/O wait

High speed in‐memory processing

Page 19: Kognitio spark modern data platform print

A*Modern Data Platform Reference Architecture

AnalyticalPlatform Near‐line

Storage(optional)

AccessApplication &Client Layer

All BI Tools All OLAP Clients Excel

PersistenceLayer

HadoopClusters

Enterprise DataWarehouses

LegacySystems

Reporting

Cloud Storage

*(not THE)

Page 20: Kognitio spark modern data platform print

© Hortonworks Inc. 2013

(another) Next-Generation Data Architecture

Page 20

APPLICAT

IONS

DAT

A SYSTEM

S

Microsoft Applications

DAT

A SO

URC

ES

Traditional Sources (RDBMS, OLTP, OLAP)

In‐memory MPP Accelerator

BI Tools & OLAP Clients

TRADITIONAL REPOSRDBMS EDW MPP

OPERATIONALTOOLS

MANAGE & MONITOR

DEV & DATATOOLS

BUILD & TEST

New Sources (web logs, email, sensors, social media)

HORTONWORKS DATA PLATFORM

Page 21: Kognitio spark modern data platform print

Analytical Platform

Page 22: Kognitio spark modern data platform print

@Kognitio #SparkEvent

It’s all about getting work done

Used to be simple fetch of valueTasks evolving: 

Then was compute dynamic aggregate

Now complex algorithms!

Now complex algorithms!


Recommended