Big Data Analytics - an infrastructure and datamanagement perspective
BDCA; Kick Off User Group Cross MeetupMarch 3rd, 2015
Jürgen Türk, CSE Netapp
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only1
Agenda
1.Who is NetApp?
2.NetApp approach to Big Data
3.Analytics Solutions – Reference Architectures
4.Case Studies
5.Wrap Up - Next Steps
NetApp Product Strategy Market-leading innovations, that are�
Shared and Dedicated
Storage Solutions
Flash
AcceleratedCloud
Integrated &
NetApp and BigData
The 3V Paradigm
� Variety� Multiple data sources
� Multiple data formats
� Velocity� High speed processing
� Fast changing requirements
� Volume� Huge amounts of data
� Process and persist
7
Why NetApp?Practical solutions that solve today’s problems
Get
Control
NetApp helps you turn your
exploding data from threat to
opportunity. Manage your data
effectively and affordably.
Break
Through
Break through the limits. With
NetApp, you can take on even the
most massive and complex data
projects.
Gain
Insight
Turn insight to action. NetApp helps
you get to clarity and insight faster
and more reliably.
Experience Managing Data at Scale
NetApp’s Largest Customer
100 Customers
50 Customers
10 Customers
4 Customers100 PB
50 PB
20 PB
10 PB
Experience Managing Data at Scale
� Best of breed storage for Big Data Applications
� Built on open standards with best-in-class partnerships
� Validated with ecosystem leaders
� Complete server, network and storage “Racks”
� Delivered via trusted high-value partners
Open
Best-of-Breed
Choice
Value PropositionSome problems require and Enterprise Class Hadoop Solution
10
Enterprise Class Hadoop
Packaged ready-to-deploy modular Hadoop cluster
� The Data has intrinsic value $$$� Usable capacity must expand faster than
compute � Higher storage performance� Real human consequences if the system fails
(Threats, treatments, financial losses)� System has to allow for asymmetric growth
White Box Hadoop
Values associated with early adopters of Hadoop
� Social Media Space � Contributors to Apache � Strong bias to JBOD� Skeptical of ALL vendors
Enterprise Class Hadoop
Packaged ready-to-deploy modular Compute / Memory intensive Hadoop cluster � Compute intensive applications� Tic Data Analysis� Extremely tight Service Level expectations
� Severe financial consequences if the analytic run is late
Enterprise Class Hadoop
Bounded Compute algorithm / Memory intensive Hadoop cluster � Compute intensive applications� Additional CPUs do not improve run time� Extremely tight Service Level expectations � Severe financial consequences if the analytic run is late
� Need for deeper storage per datanode
Co
mp
ute
Po
we
r
Storage Capacity
Challenges with Hadoop Enterprise
Operations
Implementation
� Requires three copies of data, larger footprint,
and more storage
� Limited flexibility; storage and servers tied
together affects scalability
� Low cluster efficiency, higher network
congestion
� A disk drive failure reduces performance
dramatically
� Slow recovery from disk drive failure
� Expensive process to replace failed disks
online
� Most common Hadoop support issue is disk
drive failure
Availability
� Need to keep up with fast-paced patches,
projects of open source platform
� Need to decide on distribution of Hadoop
� Skills are not common
� Integration with existing IT infrastructure can be
difficult
� Tuning expertise needed to make Hadoop
perform optimally
FlexPod Convergered Infrastructure Family
Enterprise/Service ProviderMSB/Branch Office Dedicated
Dis
tinct A
rchite
ctu
res
Dis
tinct A
rchite
ctu
res
FlexPod® Express FlexPod Data Center FlexPod Select
Cisco UCS C-SeriesNexus, Catalyst®, MDSE-Series, FASReference architecture and/or designsApplication-based management
Cisco UCS C-SeriesNexus® 3KFAS2xx0, Two fixed pod sizesCisco UCS Director, VMware®, and Microsoft®
Cisco UCS C-Series/B-Series, Nexus® 5kFAS StorageFlexible pod sizesFlexPod validated management and ecosystem
Massively scalable shared virtual data
center infrastructure
Big data analytics, scientific,
HPCFor smaller, less-dynamic
requirements and VAR velocity
Storage Pool
Network Pool
Compute Pool
AppAppApp
Storage Pool
Network Pool
Compute Pool
App AppAppAppAppApp
Storage
Network / Direct
Compute
Nodes
App
Faster deployment
And implementation
Small management efforts
– one Hotline for all
Seamless growth on
demand
Modular
Referencearchitecture–
“Building Blocks” tuned for
best cooperation
FlexPod Select =
Especially optimized for
Big Data Workloads
More operational efficiency
with less efforts
Maximum Flexibility: The Unified Architecture makes sure that a FlexPod
can be integrated into an existing IT-Infrastructure
BigData Analytics Plattform for
ComputeCenters
Scaleable and high-available
Architecture
Quick and risk-freeImplementation
Optimized and standardizedOperation
24x7 Hotline for theentire infrastucture
All Components are perfectly
tuned
Plug&Play for Industrie 4.0 Solutions
NFSv3 Connector for Hadoop
* HDFS can be swapped out or run side-by-side with HDFS..
© 2014 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use 14
JobUser jobs
Compute layer MapReduce
File System
Yarn
HDFS
Resource layer
Storage layer
MapReduce
File System
Yarn
NFS / HDFS
HDFS gets complementedwith NFS*
Schneller beschafft
Schneller implementiert
Geringerer
Managementaufwand
Eine Hotline für alles
Wächst mit Ihren
Anforderungen
Modulare
Referenzarchitektur –
“Building Blocks” passen
immer optimal zusammen
FlexPod Select =
Speziell optimiert für
Big Data Workloads
Mehr Betriebssicherheit
mit weniger Aufwand
Maximale Flexibilität: Die Unified Architektur stellt sicher, dass der FlexPod
auch in bestehende IT-Umgebungen eingebunden werden kann.
RZ konforme BigData Analytics
Plattform
Skalierbare und hochverfügbare Architektur
Schnelle, risikolose Implementierung
Optimierter und standardisierter
Betrieb
24x7 Hotline für Gesamtinfrastruktur
Alle Komponenten sind perfekt
aufeinander abgestimmt
Plug&Play für Industrie 4.0 Lösungen
Certified Storage for HANA TDI + Hadoop
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only16
FAS Product Family
7-mode and cDOT
NAS- shared file system
10Gb Ethernet and NFS
Single node and
Multi-node
SAN - Block Device
FC and XFS
E-Series
Product Family
Single node and
Multi-node
Example: FlexPod Select with Cloudera
* NetApp 50% Storage Guarantee http://www.netapp.com/us/solutions/infrastructure/virtualization/guarantee.html
� Converged big data platform from NetApp and Cisco for Hadoop
� Enterprise-class Hadoop: Innovative storage, servers, networking validated with leading Hadoop distributions
� Faster time to value: Prevalidated configuration accelerates deployment
� High availability: Less downtime, higher serviceability to meet tight SLAs around data applications and processes
� Flexible scaling: Independently scale servers and storage; modular design for scaling as data needs grow
Cisco UCS®C-Series Rack Mount Servers
NetApp® FASStorage Systems
NetApp E-SeriesStorage Array
Cisco UCS Manager
Cisco UCS Fabric Interconnect
17
Use Case Example:
NetApp Auto Support
� Correlate disk latency (hot) with disk type
– 24 billion records
– 4 weeks to run query
– Hadoop implementation 10.5 hours
� Bug detection through pattern matching
– 240 billion records – Too large to run
– Hadoop implementation 18 hours
Phone home data representing information about
the status NetApp storage controllers
Hortonworks
SAP LVMLandscape Virtualization
Management
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only19
SAP HANA Studio
Smart Data AccessE-Series
5600
10Gb Ethernetand NFS
Flexpod Select with Hadoop
UCS C-Series Server
FAS8040HA Pairwith cDOT
10Gb Ethernetand NFS
Flexpod SAP HANA Database Nodes
UCS Blade Server
FlexCloneCopies
SnapCreatorHANA PluginSAP Lumira
Mobile Device
Call to action – get started
Identification of
Usecase
Connect to
Analytics Expert
+
Connect IT
and
LOB
Workshop
Proof of
Concept
BusinesscaseReadyness check
RUN
Go
Productive