Date post: | 28-Oct-2014 |
Category: |
Technology |
Upload: | intel-it-center |
View: | 476 times |
Download: | 6 times |
Open Platform for Next-Gen Analytics
VP Intel Architecture Group GM Datacenter Software Division
@IntelITS
Boyd Davis
Today’s presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. Please refer to our most recent Earnings Release and our most recent Form 10-Q or 10-K filing for more information on the risk factors that could cause actual results to differ. If we use any non-GAAP financial measures during the presentations, you will find on our website, intc.com, the required reconciliation to the most directly comparable GAAP financial measure. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel's current plan of record product roadmaps. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
Legal Information
Making sense of one petabyte
50x To read
in Library of Congress
13y To view
as HD Video
11s To generate
in 2012
http://blogs.loc.gov/digitalpreservation/2011/07/transferring-libraries-of-congress-of-data/
Analysis of data can transform society
Enhance scientific understanding, drive innovation, and accelerate medical cures
Create new business models and improve organizational processes
Increase public safety and improve energy efficiency with smart grids
Virtuous cycle of data-driven user experience
CLOUD
Richer data to analyze CLIENTS
Richer data from devices
Richer user experiences
INTELLIGENT SYSTEMS
Intel at the intersection of forces behind big data
Enabling exascale computing on massive data sets
Helping enterprises build open interoperable clouds
Contributing code and fostering ecosystem
HPC Cloud Open Source
Intel® TrueScale Infiniband
* Other names and brands may be claimed as the property of others.
Democratize data analysis from edge to cloud
Unlock value in silicon Support open platforms
Deliver software value
Research
Benchmarking
Tuning Optimization
Product
History of Intel and Apache Hadoop*
2009 2013
Open Cirrus*
HiBench Release 1.0
(2011)
* Other names and brands may be claimed as the property of others.
Release 2.0 (2012) Telco Smart City
Web
Retail Healthcare
Announcing availability of Intel® Distribution for Apache Hadoop* software
Hardware-enhanced performance & security
Enables partner innovation in analytics Strengthens Apache Hadoop* ecosystem
* Other names and brands may be claimed as the property of others.
Intel® Distribution for Apache Hadoop* software
All external names and brands are claimed as the property of others.
Intel® Manager for Apache Hadoop software Deployment, Configuration, Monitoring, Alerts, and Security
HDFS 2.0.3 Hadoop Distributed File System
YARN (MRv2) Distributed Processing Framework
HB
ase
0.9
4.1
Co
lum
nar
Stor
e
Zook
eepe
r 3
.4.5
Co
ordi
natio
n
Flum
e 1
.3.0
Lo
g Co
llect
or
Sqoo
p 1
.4.1
D
ata
Exch
ange
Pig 0.9.2
Scripting Hive 0.9.0
SQL Query Oozie 3.3.0
Workflow Mahout 0.7 Machine Learning
R connectors Statistics
Intel enhancements contributed back to open source
Open source components included without change
Intel proprietary
Intel® Distribution for Apache Hadoop* software
• Up to 20x faster decryption with AES-NI*
• Optimized with SSD and Cache Acceleration
• Up to 8.5X faster queries in Hive*
• Hardware-enhanced compression with AVX & SSE4.2
• Automated tuning with Intel® Active Tuner
*Based on internal testing
Sold with World-Class Intel Support
Annual Subscription with Technical Support Support Coverage Options: 24x7 or 8x5 Via Solution Vendors and Service Providers
Backed by broad portfolio of datacenter products
Software
Network Storage & Memory Server
Cache Acceleration Software
* Other names and brands may be claimed as the property of others.
Paul Perez Vice President and GM
Data Center Group
Intel portfolio delivers balanced performance
Intel® Xeon 5690
7200 HDD
1GbE Adapter
~7 minutes
>4 hours
Intel® Xeon® E5-2690 processor
~50% improved Intel® SSD
520 Series
~80% improved
Intel® 10GbE Adapters
~50% improved
Intel® Distribution for Apache Hadoop*
software
~40% improved
Other brands and names are the property of their respective owners
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal testing For more information go to : intel.com/performance `
Shown to improve 1 Terabyte sort from 4 hours to 7 minutes
Proven in the enterprise
Using the Intel® Distribution to gain tremendous results
* Other names and brands may be claimed as the property of others.
IT
* Other names and brands may be claimed as the property of others.
Satnam Alag Vice President and CTO
Delivering innovation in the open
Pipeline of innovation from Intel Labs
• Machine Learning
• Data-Intensive Algorithms & Computer Architecture
Roadmap of open source from Intel Software • Project Panthera: Standard SQL on Apache Hadoop
• Project Rhino: Hardening Apache Hadoop
Lighting up unused data for big impact
2013 2014 2015 2016 2017
Intel accelerating adoption of Hadoop +
Apache Hadoop landing on Intel Xeon
2 years faster
Intel® Xeon processor growth from big data use
Uni
ts
With broad support from the ecosystem
* Other names and brands may be claimed as the property of others.
Enabling partner innovation in next-gen analytics
Paul Perez, Vice President and GM Data Center Group
Richard Pledereder, Senior Vice President SAP® HANA* Engineering
Steve Garrou, Vice President Global Solutions
Ranga Rangachari, Vice President and GM Storage Business
Summary
• Intel announced Intel® Distribution for Apache Hadoop* software
• Delivers hardware-enhanced capabilities and software enhancements
• Backed by broad portfolio of Intel data center products
• Contributes to open source and supports Apache Hadoop
• Enabling ecosystem of partners to innovate on analytics solutions
Q&A
Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number
Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization
No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer system with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit http://www.intel.com/technology/security
Intel, Intel Xeon, Intel Atom, Intel Xeon Phi, Intel Itanium, the Intel Itanium logo, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Other names and brands may be claimed as the property of others.
Copyright © 2013, Intel Corporation. All rights reserved.
Apache Hadoop Performance Test Configuration 4 hours to 7 minutes
Cluster Configuration q 1 Head Node (name node, job tracker) q 10 Workers (data nodes, task trackers) q 10-Gigabit Switch: Cisco Nexus 5020 Software Configuration q Intel Distribution for Apache Hadoop 2.1.1 q Apache Hadoop 1.0.3 q RHEL 6.3 q Oracle Java 1.7.0_05
Head Node Hardware q 1 x Dell r710 1U servers
§ Intel: 2x3.47GHz Intel® Xeon® processor X5690
§ Memory: 48G RAM § Storage: 10K SAS HDD § Intel® Ethernet 10 Gigabit SFP+ § Intel® Ethernet 1 Gigabit
Worker Node Hardware 10 x Dell r720 2U servers
§ Intel: 2 x 2.90Ghz Intel® Xeon® processor E5-2690 § Memory: 128G RAM § Storage: 520 Series SSDs § Intel® Ethernet 10 Gigabit SFP+ § Intel® Ethernet 1 Gigabit
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Note: The below disclaimer should be included whenever the general performance disclaimer is used, but should be numbered separately: Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance