Date post: | 19-Jun-2015 |
Category: |
Technology |
Upload: | vu-hung-nguyen |
View: | 286 times |
Download: | 5 times |
Intel Confidential — Do Not Forward Intel Information Technology
Big Data and HPC technologies 24 July, 2014
Austin Cherian
Sr. Applications Engineer
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
Legal Disclaimer - Notice INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: http://www.intel.com/products/processor_number
Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For more information, see http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/
No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit http://www.intel.com/technology/security
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, and virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization
Requires a system with Intel® Turbo Boost Technology. Intel Turbo Boost Technology and Intel Turbo Boost Technology 2.0 are only available on select Intel® processors. Consult your PC manufacturer. Performance varies depending on hardware, software, and system configuration. For more information, visit http://www.intel.com/go/turbo
Copyright © 2014 Intel Corporation. All rights reserved. Intel, Intel Xeon, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
*Other names and brands may be claimed as the property of others.
2
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
Legal Disclaimers - Performance Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.
Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported.
SPEC, SPECint, SPECfp, SPECrate, SPECpower_ssj, SPECjAppServer, SPECjEnterprise, SPECjbb, SPECompM, SPECompL, and SPEC MPI are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information.
TPC Benchmark is a trademark of the Transaction Processing Council. See http://www.tpc.org for more information.
SAP and SAP NetWeaver are the registered trademarks of SAP AG in Germany and in several other countries. See http://www.sap.com/benchmark for more information.
3
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
Optimization Notice
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for
optimizations that are not unique to Intel microprocessors. These optimizations include SSE2®, SSE3,
and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability,
functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please
refer to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice.
Notice revision #20110804
4
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to
change without notice. Romley, Ivy Bridge, Sandy Bridge, Westmere, Nehalem, Harpertown, and certain other names are code names used to identify
unreleased Intel products. Intel makes no warranty of trademark non-infringement, and use of these code names by third parties is at their own risk.
Intel may make changes to specifications, product descriptions, and plans at any time, without notice. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,
BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information.
The Intel® Xeon® Processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel, the Intel logo, Intel® Virtualization Technology, Intel® I/O Acceleration Technology, Intel® VTune™ Analyzer, Intel® Thread Checker™, Intel® Tools, Intel® Trace Analyzer and Collector and Intel® Xeon™ are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information including details on which processors support HT Technology, see here
“Intel® Turbo Boost Technology requires a PC with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your PC manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost.”
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain computer system software enabled for it. Functionality, performance or other benefits will vary depending on hardware and software configurations and may require a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor.
*Other names and brands may be claimed as the property of others.
5
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
• Intel Xeon Architecture Roadmap
• Big Data Trends
• Intel in Big Data
• The Big Data HPC connection
Agenda
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
• Intel Xeon Architecture Roadmap
• Big Data Trends
• Intel in Big Data
• The Big Data HPC connection
Agenda
Intel Confidential Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2013, Intel Corporation.
Intel Tick Tock Model – Inspiring Confidence
Typically, Increases in Transistor Density Enables New Capabilities, Higher Performance Levels, and Greater Energy Efficiency
Haswell Sandy Bridge
Ivy Bridge
Nehalem
Westmere
32nm 22nm 45nm
Nehalem Microarchitecture
Sandy Bridge Microarchitecture
Haswell Microarchitecture
TICK
TOCK
8
1,14 1,27 1,28 1,30 1,35 1,41
1,9
0,00
0,20
0,40
0,60
0,80
1,00
1,20
1,40
1,60
1,80
2,00
E5-2697 v2 Baseline
STREAM (Triad)
SPECfp*_rate_ base2006
SPECjbb* 2013
MultiJVM
SPECint*_rate_ base2006
Brokerage OLTP
Warehouse OLTP
Linpack
Prel
imin
ary
Rel
ativ
e Pe
rfor
man
ce
Intel® Xeon® E5-26xx v3 (14C, 2.7GHz, 145W) vs. Intel® Xeon® E5-2697 v2 (12C, 2.7GHz, 130W)
Intel® Xeon® Processor E5-2600 v3 Product Family Preliminary Performance Expectations
Source: Intel internal estimates as of 17 Nov 2013. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others.
Up to 37% performance boost on average expected over previous Xeon® generation
9 Intel Confidential
Max jOPs
Intel Confidential – NDA Required
The Intel® Xeon Phi™ Coprocessors: Formerly code named the Knights corner
10
Intel Confidential – NDA Required
Intel® Xeon® Processors + Intel® Xeon Phi™ Coprocessors: Complimentary Solutions for Parallel Workloads
Leadership performance for the majority of server & workstation workloads
Versatile foundation to meet rapid growth in users, devices, and data
Robust energy efficiency, security, and reliability to reduce data center costs
Advanced performance for highly parallel workloads for breakthrough innovation and discovery
Based on Intel® MIC Architecture; Works synergistically with Intel® Xeon® Processors
Increased developer productivity via programming models & tools common with Intel® Xeon® Processors
Develop with Intel tools for Intel® Xeon Processor today, Scale your software investment to include Intel® Xeon Phi™ Products
11
Intel Confidential Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands
may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2013, Intel Corporation.
22 nm process Coprocessor
Over 1 TF DP Peak
Up to 61 Cores Up to 16GB GDDR5
Available Today Knights Corner Intel® Xeon Phi™ x100 Product Family
2H’15* Knights Landing Intel® Xeon Phi™ x200 Product Family
Future TBA 3rd generation
14 nm process
Server Processor & Coprocessor
Over 3 TF DP Peak1
60+ cores
Up to 400GB Memory
~500 GB/s sustained mem bandwidth
In planning
* First commercial systems All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. 1 Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expecations of cores, clock frequency and floating point operations per cycle. FLOPS = cores x clock frequency x floating-point operations per second per cycle.
Knights Landing
Knights Landing with Fabric
Intel® Xeon Phi™ Product Family Path to Performance and Programmability
Weather Research and Forecasting (WRF) Conus 2.5 km
13
Application: Weather Research and Forecasting (WRF)
Availabilty: WRF V3.5 was released 4/18/13 https://software.intel.com/en-us/articles/how-to-get-wrf-running-on-the-
intelr-xeon-phitm-coprocessor
Code Optimization: Approximately two dozen files with less than 2,000 lines of code were
modified (out of approximately 700,000 lines of code in about 800 files, all Fortran standard compliant)
Most modifications improved performance for both the host and the co-processors
Performance Measurements: V3.5 and NCAR supported CONUS2.5KM benchmark (a high resolution weather forecast)
Acknowledgments: There were many contributors to these results, including the National
Renewable Energy Laboratory and The Weather Channel Companies
SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2013
1,00
1,56
0,00 0,20 0,40 0,60 0,80 1,00 1,20 1,40 1,60 1,80
Speedup (Higher is Better)
2S Intel® Xeon® processor E5-2670
2S Intel® Xeon® processor E5-2670 + Intel® Xeon Phi™ coprocessor (pre-production HW/SW)
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to http://www.intel.com/performance
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
• Intel Xeon Architecture Roadmap
• Big Data Trends
• Intel in Big Data
• The Big Data HPC connection
Agenda
INTEL CONFIDENTIAL
Virtuous Cycle of Data-Driven Innovation
2.8 Zettabytes of data will be generated WW in 20121
Richer user experiences
Richer data from devices
40 Zettabytes of data will be generated WW in 20201
Richer data to analyze
Cloud
Clients
Intelligent Systems
(1) IDC Digital Universe 2020, (2) IDC
15
INTEL CONFIDENTIAL
HPC Enabling exascale computing on massive data sets
Cloud Helping enterprises build open interoperable clouds
Open Source Contributing code and fostering ecosystem
Forces Driving Big Data Advancement
Intel®
TrueScale Infiniband
* Other names and brands may be claimed as the property of others.
16
INTEL CONFIDENTIAL
Enterprise and Big Data How is it used in different sectors?
US health care $300 billion value per year ~0.7 percent annual productivity growth
Europe public sector administration €250 billion value per year ~0.5 percent annual productivity growth
Global personal location data $100 billion+ revenue for service providers Up to $700 billion value to end users
US retail 60+% increase in net margin possible 0.5-1.0 percent annual productivity growth
Manufacturing Up to 50 percent decrease in product development, assembly costs Up to 7 percent reduction in working capital
FMCG: Problem solving ideas, product decisions, predict market reaction, enhance brand relevance
Financial: To detect credit card fraud, greater accuracy and granularity in risk assessment
Cable and TV: Customize TV ads to individual household
Mobile Ads: Ads based on locations
Retail: Insight and understanding of customer likes, dislikes, influences and behaviors
General business: To discover unknown and untapped behaviors and attitudes
Healthcare: Evidence based medicine
Utilities: Understanding of individualized use patterns and better manage demand
Online businesses: Better understanding of customer preferences and social interactions
Crime prevention/intelligence: Better analysis of patterns
SOURCE: McKinsey Global Institute analysis SOURCE: McKinsey
Big data can generate significant financial value across sectors
Solving today’s problems faster
17
INTEL CONFIDENTIAL
Intel Leverages the Power of Big Data
18
MALWARE
new malware samples per quarter1
MILLION
U.S. cyber attacks per day2
CYBER ATTACKS
MILLION 1 “McAfee Threats Report: Second Quarter 2012,” McAfee, www.mcafee.com/us/resources/reports/rp-quarterly-threat-q2-2012.pdf (PDF) 2 Koebler, Jason, “U.S. Nukes Face Up to 10 Million Cyber Attacks Daily ,” U.S. News & World Report (2012), www.usnews.com/news/articles/2012/03/20/us-nukes-face-up-to-10-million-cyber-attacks-daily
Chip Design Validation: Cut Product Time to Market by 25%
Faster analysis process for validating results
Streamlined debug process through analysis of large volumes of historical test data
Reseller Channel Management: Increased sales by $5M per Qtr. Decreased cost by $6M per Qtr.
Smarter reseller engagement prioritization by leveraging advanced customer profile algorithms
Cost efficient detection of non-complaint claims
Malware Detection: Proof of Concept (POC) Collecting and analyzing large amounts of server security data at the system, network, and application levels lead to discovery of new malware threats before they arise.
INTEL CONFIDENTIAL
Virtuous Cycle of Data… Inside and Outside the Box
19
Transform / Analyze Compute
Move Networking
Data
Persist Storage
INTEL CONFIDENTIAL
Reimagining the Possibilities with Big Data Analytics Move to Value & Vision
Enhance understanding, drive innovation, and accelerate personalized medical cures
Create new business models and transform organizational processes
Enhance public safety and transportation, increase energy efficiency and reduce carbon footprint
20
INTEL CONFIDENTIAL
Big Data Adoption and Deployment Phases
Investigate Understand
business model
Organization alignment
Market &
technology trends
Discover
Define problem statement
Identify business
use cases
Gather requirements
Develop success
metrics
Plan
Identify high value, high visibility use
cases
Define scope & ROI for proof of concept
(POC)
Identify Big Data reference
architecture
Implement
Pilot the POC
Promote and extend the POC result for other
projects
Extend and enhance more
advanced analytic capabilities
• What insights would best benefit your
business?
• What results are you really trying to get?
• What do you want to do with your data?
• What kind of data & correlations are you
interested in mining?
21
• How much return are you expecting on your
investment?
• What is your timeline for getting results?
• Are there other industries or uses you’re
using for a model?
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
• Intel Xeon Architecture Roadmap
• Big Data Trends
• Intel in Big Data
• The Big Data HPC connection
Agenda
INTEL CONFIDENTIAL
Resp
onsi
ve
Ener
gy
Effic
ient
H
igh
Avai
labi
lity
Sec
ure
Intel’s Foundational Technologies Offer Advanced Solutions for Big data Analytics
Cho
ice
Big Data Building Blocks
Intelligent Storage1
Scale-out Storage1 Scale-up Storage1
Intel® SSD 710
series, DC S3700 (SATA)
Intel® SSD 910 series (PCIe)
Intel® Ethernet Controllers
Intel® Ethernet Adapters
Intel® Ethernet Switch Silicon
Intel® True Scale Fabric
Compute Network Storage
Intel® Contribution to OpenSouce Hadoop Intel® Data Center
Manager Intel® Node Manager Intel® Expressway Service Gateway
Intel® Cache Acceleration Software
Intel’s Lustre Intel® VT and
Intel® TXT Intel® AES-NI
Software & Technologies
Intel® Xeon® Product Family E3-
E5-E7 Intel® Atom™
Intel® Xeon PhiTM
Xeon-based storage systems are available in a wide range of configuration options from the industry’s leading storage vendors
INTEL CONFIDENTIAL
Intel® Xeon® 5600 HDD 1GbE
Hadoop processing time: <10 minutes
Unleash the power of platform TeraSort for 1TB sort: >4 hour process time
Upgrade processor
~50% reduction
Upgrade to SSD
~80% reduction
Upgrade to 10GbE
~50% reduction
Open Source Contributions
~40% reduction
*Other brands and names are the property of their respective owners
Nearly 50x increase in your ability to discover insights
24
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products. Source: Intel Internal testing
For more information go to : intel.com/performance ` Whitepaper
INTEL CONFIDENTIAL
Intel Intelligence at the Edge
Intel® Intelligent Systems Framework: Simplifying the Internet of Things
Wind River Intelligent Device Platform
Driving Secure Interoperability
Unlocking Edge Data
Filtering Data
Billions of devices that need to share data with each other and
the cloud
Edge systems need to react to streaming
data in real time
Data volume outpacing network
and storage efficiency
Connectivity Manageability Security
Pre-integrated smart and connected capabilities enable rich network options to save development time and costs
Validated and flexible firmware providing an extensive network
of connectivity choices, including broad modem support
and PAN, LAN, and WAN network access
Platform customization significantly reduces time to
product while increasing productive life of M2M
devices
Intuitive web-based tool reduces configuration and
support costs and allows for anytime provisioning and management of devices
Dynamic post dynamic “Services” framework (OSGi)
enables modularized, hardware agnostic
deployment of new apps
Security features designed for M2M development that protect
critical data throughout the device lifecycle
Customizable SRM to ensure the integrity of the end devices via secure boot,
provide encrypted communication between device and management
console in the cloud, and offer device resource management to limit system exposure of
untrusted applications
25
INTEL CONFIDENTIAL
Intel Data Research
Dozens of Academic Industry & Gov Research Collaborations
GraphLab, GraphBuilder
Disaggregation / Silicon
Photonics
Worry-Free Data Protections to allow you to control how your data is used
Data Economy, Vibrant Data, Data Visualization
Future Data Experiences
Model-Based workloads, Everyday Analytics Analytical Applications
Relationship Discovery, Real-time learning Machine Learning
Algorithms
Graph, parallel, statistical computing partitioned across cloud, client and edge
Distributed Data Computing
Platforms, architectures, DBMSs, smart networks, and the Internet of Things
Data Ecosystem Infrastructure
26
INTEL CONFIDENTIAL
Summary: Intel in Big Data
The pervasiveness of Intel Architecture democratizes the implementation and performance of Big Data everywhere
Accelerate analytics: CPU, storage, and
network Optimized ISV
software stacks and services
Foster the growth of market partners
Solution research and academia engagement
Distribute analytics to the edge
27
INTEL CONFIDENTIAL 28
BioScience: Genomics for Translational Medicine Hadoop for Data Correlation & Discovery Insights
Challenge: Derive new value added patient discovery services while bringing down genome processing costs Solution: Dynamically partition/scale Hbase for correlation of patient data to all public data Benefits: Contributes to 800x reduction in cost to process 4 M genome variants Data Characteristics: • 10 Node Hbase Cluster • Billions of pre-computed correlations
1 Genome 10 Million rows
100 Genomes 1Billion rows
1M Genomes 10 Trillion rows
100M Genomes 1 Quadrillion 1,000,000,000,000,000 rows
Billions of Pre-computed
Correlations
New Biomed Info-Products
Data Ingest
INTEL CONFIDENTIAL
Public Sector- Smart Traffic Intelligent Transport System Hadoop for Predictive Analytics
29
Challenge: Analyze city traffic to derive statistics for crime prevention, info sharing, and predictive traffic analysis Solution: Embed HBase client in camera for real-time inserts of structured/unstructured data Benefits: •Automated queries for traffic violation •Data mining of fake licenses <1 minute for all data captured for a week •Predictive traffic forecasting
Data Characteristics: • 30000 + camera data collection points • Petabytes of traffic data & terabytes of
images • 2 billion HBase records
App Servers
Regional Data Collection
Distributed Processing Across District Nodes
Derived Analytics Services
Crime Prevention Citizen Traffic Services
INTEL CONFIDENTIAL
Telco- China Unicom Hadoop for Behavioral Analysis
Challenge: Analyze subscriber web usage and billing to derive new information products Solution: Scale out storage based on Hbase with network optimization based on web traffic, log analysis for daily reporting Benefits: New customer segmentation Data Characteristics: • 188 nodes, 14TB/server • 2.5PB raw disk capacity • High speed data loading • Real-time query (latency <1s) • Daily statistics & reports (sum, count,
join, etc)
Subscriber Usage & Billing
ETL
• MapReduce/Hive • Hbase • HDFS
• Log Analysis • Daily Reports
Storage, Analytics
New Customer Segmentation & Insights
INTEL CONFIDENTIAL
*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL
• Intel Xeon Architecture Roadmap
• Big Data Trends
• Intel in Big Data
• The Big Data HPC connection
Agenda
Intel Confidential
Lustre: The Most Used HPC File System
Based on Intel analysis of the 11/2013 Top500, Lustre accounts for:
1. 7 of the fastest 10 systems in the world
2. And ~60% of the top 50 systems
Per May 2014 survey research from IDC, Lustre is used by +50% of sites • 16-20% use shares for GPFS, NFS
and pNFS • HDFS shown to illustrated only to
reflect extent of HPC sites deploying Hadoop workloads onto HPC (diskless) storage platforms
32
Lustre GPFS NFS pNFS Red Hat HDFS
1 Source: IDC survey research, May 2014
1 Source: Shared rounded to nearest percent, totals exceed 100% due to the use of multiple file systems
Intel Confidential
The Intel® Solutions for Lustre Portfolio
33
Intel® Enterprise Edition for Lustre* software v2 • Simple, powerful management tools added to full Lustre release foundation
• Maximum performance with minimal management complexity and costs
• Sold through global reseller network (comprised of OEM and integrators)
Intel® Cloud Edition for Lustre* software • Fast, cost effective parallel storage for applications deployed on cloud
infrastructure
• Uses Amazon AWS storage (EC2) and compute instances
• Multiple support options available
• Sold today via Amazon Web Services Marketplace
Intel Confidential
Intel® Manager for Lustre Streamlined configuration and management workflow Advanced charting and reporting Intel® Manager for Lustre support for HSM Support for larger sized configurations
Storage Servers Full distribution of open source Lustre v2.5 Support for Red Hat Enterprise Linux 6.4, CentOS 6.4 New support for SUSE SLES 11 servers (cannot use SUSE for IML)
Compute Clients Native Lustre client for Intel® Xeon Phi™ Improved client I/O performance Expanded client platform support
Enterprise Edition v2 – Features and Improvements
34
Intel Confidential
Native Lustre Client for Intel® Xeon Phi™
• Native Lustre client for Intel® Xeon Phi™
• Allows applications running on Phi to have direct access to fast, scalable storage resource
• Benefit: Improved I/O performance for Xeon Phi™ applications
*
0
50
100
150
200
250
300
NFS over Virtual Ethernet Lustre over Virtual IB
MB
/sec
IOZONE Benchmark using 32 threads
Write Read
10X
1 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
35
Intel Confidential
Lustre: Ideal for Hadoop Workloads
Convergence of HPC and data analytics Desire for HPC systems to run Hadoop workloads Hadoop is the most popular software
stack for big data analytics
Lustre is the file system of choice for HPC clusters
Challenge: Use Lustre with Hadoop
Benefits of using Enterprise Edition for Lustre with Hadoop applications Improved application performance –
without changing applications
More efficient and productive storage resources
No data transfer overhead for staging inputs and extracting results
Eliminates 3-way replication used by HDFS
Shared, easily managed storage - no need to arbitrarily partition storage into HPC (Lustre) and Analytics (HDFS) islands
36
Intel Confidential
Enterprise Edition for Lustre v2
Optimized Storage for
Hadoop Application
s
Hierarchical Storage
Management Monitoring & data movement
tools
Intel® Manager for Lustre Configure, Troubleshoot, Monitor,
Manage
CLI
REST API Extensibility
Management and Monitoring Services
Lustre File System Full distribution of open source Lustre software v2.5
Storage Plug-In
Integration
37
Open source base Intel value-add for Lustre Interoperability with Hadoop distributions for fast, shared, simple to manage storage for MapReduce applications
Thank You