Hadoop and the new BI: The Modern Data Architecture…for in memory Big Data Analytics 10 December 2013
© Hortonworks Inc. 2013
Quick Housekeeping
Q&A box is available for your questions Q&A box is available for your questions
Webinar will be recorded for future viewing Webinar will be recorded for future viewing
Thank You for joining!Thank You for joining!
© Hortonworks Inc. 2013
Modern Data Architecture…for in memory Big Data Analytics
Page 3
© Hortonworks Inc. 2013
Your Presenters
Page 4
• Paul Groom (@datagroom)–Chief Innovation Officer–28 years buried in the big data of the data
guiding business users to value–Two wheels are more fun than four
• John Kreisa (@marked_man)–VP Strategic Marketing, Hortonworks–Over 20 years in data management as a
developer and a marketer–Avid camper
© Hortonworks Inc. 2013
Today’s Topics
• Introduction• Drivers for the Modern Data Architecture (MDA)• Apache Hadoop in the MDA• Kognitio’s role in the MDA• Q&A
Page 5
© Hortonworks Inc. 2013
Existing Data Architecture
Page 6
APPLICAT
IONS
DAT
A SYSTEM
REPOSITORIES
SOURC
ES Existing Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Business Analytics
Custom Applications
PackagedApplications
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
© Hortonworks Inc. 2013
Modern Data Architecture Enabled
Page 7
APPLICAT
IONS
DAT
A SYSTEM
REPOSITORIES
SOURC
ES Existing Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
PackagedApplications
© Hortonworks Inc. 2013
Hadoop Powers Modern Data Architecture
Page 8
Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF) that allows you to gain insight from massive amounts of structured and unstructured data quickly and without significant investment.
Hadoop Clustercompute
&storage
. . .
. . .
. . compute&
storage
.
.
Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
© Hortonworks Inc. 2013
Drivers of Hadoop Adoption
Page 9
From NEW types of Data (or existing types for longer)
New Business Applications
© Hortonworks Inc. 2013
Most Common NEW TYPES OF DATA
1. SentimentUnderstand how your customers feel about your brand and products – right now
2. ClickstreamCapture and analyze website visitors’ data trails and optimize your website
3. Sensor/MachineDiscover patterns in data streaming automatically from remote sensors and machines
4. GeographicAnalyze location-based data to manage operations where they occur
5. Server LogsResearch logs to diagnose process failures and prevent security breaches
6. Unstructured (txt, video, pictures, etc..)Understand patterns in files across millions of web pages, emails, and documents
Value
© Hortonworks Inc. 2013
Keep Existing Data Around Longer
• Online archive–Data that was once moved to tape can
now be queried to understand long term trends
• Compliance retention–Industry specific requirements for retention
of data
• Combine with external historical data sources– Weather, survey, research, purchased, etc.
Value
© Hortonworks Inc. 2013
Drivers of Hadoop Adoption
Page 12
A Modern Data ArchitectureComplement your existing data systems: the right workload in the right place
Architectural
New Business Applications
© Hortonworks Inc. 2013
IntegratedInteroperable with existing data center investments Skills
Leverage your existing skills: development, operations, analytics
Requirements for Hadoop Adoption
Page 13
Key ServicesPlatform, operational and data services essential for the enterprise
Requirements for Hadoop’s Role in the Modern Data Architecture
© Hortonworks Inc. 2013
IntegratedEngineered with existing data center investments
1 Key ServicesPlatform, Operational and Data services essential for the enterprise
SkillsLeverage your existing skills: development, analytics, operations
2
3
Requirements for Enterprise Hadoop
Page 14
OS/VM Cloud Appliance
PLATFORM SERVICES
CORE
Enterprise ReadinessHigh Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATASERVICES
HDFS
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN
MAP TEZREDUCE
HIVE &HCATALOG
PIGHBASE
© Hortonworks Inc. 2013
Requirements for Enterprise Hadoop
Page 15
1
IntegrationEngineered with existing data center investments
Key ServicesPlatform, operational and data services essential for the enterprise
SkillsLeverage your existing skills: development, analytics, operations
2
3DEV
ELOP
ANAL
YZE
OPE
RATE
COLLECT PROCESS BUILD
EXPLORE QUERY DELIVER
PROVISION MANAGE MONITOR
© Hortonworks Inc. 2013
Familiar and Existing Tools
Page 16
1
IntegrationEngineered with existing data center investments
Key ServicesPlatform, operational and data services essential for the enterprise
SkillsLeverage your existing skills: development, analytics, operations
2
3DEV
ELOP
ANAL
YZE
OPE
RATE
COLLECT PROCESS BUILD
EXPLORE QUERY DELIVER
PROVISION MANAGE MONITOR
© Hortonworks Inc. 2013
APPLICAT
IONS
DAT
A SYSTEM
REPOSITORIES
SOURC
ES Existing Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Custom Applications
PackagedApplications
Requirements for Enterprise Hadoop
Page 17
IntegrationEngineered with existing data center investments
3
Integrated with
ApplicationsBusiness Intelligence, Developer IDEs, Data Integration
SystemsData Systems & Storage, Systems Management
PlatformsOperating Systems, Virtualization, Cloud, Appliances
© Hortonworks Inc. 2013 - Confidential
Complement data systems
Right workload right place
A Modern Data Architecture Applied
Page 18
APPLICAT
IONS
DAT
A SYSTEM
REPOSITORIES
SOURC
ES Existing Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
Business Analytics
Custom Applications
PackagedApplications
© Hortonworks Inc. 2013 - Confidential
Kognitio in the Modern Data Architecture
Page 19
APPLICAT
IONS
DAT
A SYSTEM
REPOSITORIES
SOURC
ES Existing Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Business Intelligence Tools OLAP Clients
In‐memory MPP Accelerator
© Hortonworks Inc. 2013 - Confidential
Kognitio in the Modern Data Architecture
Page 20
APPLICAT
IONS
DAT
A SYSTEM
SOURC
ES
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
HANA
BusinessObjects BI
OPERATIONAL TOOLS
DEV & DATA TOOLS
Existing Sources (CRM, ERP, Clickstream, Logs)
INFRASTRUCTURE
In‐memory MPP Accelerator
© Hortonworks Inc. 2013
Today’s Topics
• Introduction• Drivers for the Modern Data Architecture (MDA)• Apache Hadoop’s role in the MDA• Kognitio’s role in the MDA• Q&A
Page 21
© Hortonworks Inc. 2013
12
3IntegratedInteroperable with existing data center investments Skills
Leverage your existing skills: development, operations, analytics
Hadoop and the new BI
Page 22
Key ServicesPlatform, operational and data services essential for the enterprise
Requirements for Hadoop’s Role in the Modern Data Architecture
© Hortonworks Inc. 2013
Motivation
• Historical architecture = Existing investment
Page 23
Cognos
• Must plug-and-play with MDA– Do not disrupt, enhance!
• Performance and behavior expectations– Dynamic ad-hoc access– Drill unlimited– Report on-demand
1 Key ServicesPlatform, Operational aData services essentialfor the enterprise
© Hortonworks Inc. 2013
Business [Intelligence] Desires
Page 24
More timelyLower latency
More granularityBetter concurrency
Richer data model
Self service
© Hortonworks Inc. 2013
BI Activity
Page 25
Insulate the Hadoop cluster
© Hortonworks Inc. 2013
In-memory analytical platform• Software only
– Easy to deploy alongside HDP– Simple two stage install
• Commodity Hardware– X86/64 Linux Platform with 10GbE network – same as HDP– Biased to more RAM and less disk
• Scale-out MPP– Same compute model as Hadoop– Strong focus on 100% effective CPU utilization for any given query
• Exploits features of underlying persistent store– Simple ‘Pull data’ access methods– Parallelism – all HDP nodes intercommunicating with all Kognitio nodes
• ANSI 2011 SQL– Mature fully featured– Transaction processing capable
• Not-only-SQL– Any script or binaries executed in-line within SQL queries
Page 26
SkillsLeverage your existing skills: development, analytics, operations
2
IntegrationEngineered with existing data center investments
3
© Hortonworks Inc. 2013
Tight Integration
• HDFS Connector– Low Latency access
Page 27
• Map-reduce Connector– Filtered access
IntegrationEngineered with existingdata center investments
3
© Hortonworks Inc. 2013
So why In-memory?
• Exploit the ‘Dynamic’ access element of ‘D’-RAM– Data placed in memory in structures best suited for CPUs, not for disks
Page 28
INSTANT WAIT
© Hortonworks Inc. 2013
In-memory – getting work done
Page 29
© Hortonworks Inc. 2013
Building Data Models• Hadoop is a great repository• Perfect to handle volume and variability without effort• Perfect to ‘triage’ the data, to reshape, filter and project into…
• Data Virtualisation / Logical Data Warehouse… but with the associated horsepower to dynamically analyse the data
• Plug standard tools straight in – not a Java programmer in sight! • Central control and security
• Data model shelf life getting shorter – sandboxes and workbenches– Build on-demand to meet todays needs – just pull data from your HDP– Lots of project based discovery and analytics– World is changing rapidly– Ever tighter feedback loops
Page 30
© Hortonworks Inc. 2013
Increasing Computation
Page 31
Machine learning algorithms Dynamic
Simulation
Statistical Analysis
Clustering
Behaviour modelling
Reporting & BPM
Fraud detection
Dynamic Interaction
Technology/Automation
Ana
lytic
al C
ompl
exity
Campaign Management
© Hortonworks Inc. 2013
The Analytical Enterprise
Business Analyst
Systems Admin
Data Scientist
Key: “Graduation”• Projects will need to easily Graduate
from the Data Science Lab and become part of Business as Usual
© Hortonworks Inc. 2013
Mature SQL atop Hadoop
Page 33
Kognitio is an in‐memory analytical platform that is tightly integrated with Hadoop for high‐performance advanced analytics
that make Big Data more consumable for enterprises,
especially those with mature BI environments or engrained
tools.
• Privately held• Invented the in‐memory analytical platform• Labs in the UK ‐ HQ in New York, NY
• Powering advanced analytics at organizations worldwide, such as:
© Hortonworks Inc. 2013
Kognitio in the Modern Data Architecture
Page 34
APPLICAT
IONS
DAT
A SYSTEM
REPOSITORIES
SOURC
ES Existing Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
Business Analytics
Business Intelligence Tools OLAP Clients
In‐memory MPP Accelerator
© Hortonworks Inc. 2013
Forrester Wave: a “strong performer”
Page 35© Forrester Corp. Used with permission.
• Kognitio’s entirely in-memory, distributed EDW is appealing for customers looking for fast performance on commodity hardware
• Kognitio’s EDW is a strong, cost-effective alternative to SAP HANA.
• Kognitio…was designed from the start as an MPP (distributed) in-memory RDBMS, making extensive use of RAM-based processing for maximum performance.
• Download a complimentary copy of the full report at www.kognitio.com/wave
Question & Answer session will be conducted electronically, using the panel to the right of your screen
Today’s Slides available at: www.slideshare.net/kognitio
The Modern Data Architecture…for in memory Big Data Analytics
More about Kognito and Hortonworkshttp://hortonworks.com/partner/kognitio
Get started with Hortonworks Sandbox http://hortonworks.com/hadoop-tutorial/
Follow us:@hortonworks @kognitio