+ All Categories
Home > Technology > Intel And Big Data: An Open Platform for Next-Gen Analytics

Intel And Big Data: An Open Platform for Next-Gen Analytics

Date post: 28-Oct-2014
Category:
Upload: intel-it-center
View: 476 times
Download: 6 times
Share this document with a friend
Description:
On Intel as a platform for big data Intel's VP of Architecture Group and GM of Datacenter Software Boyd Davis discusses Intel's contribution and expansion of the foundational technology HADOOP as a means to enrich business intelligence and analysis from the edge to the cloud. Head to http://intel.com/bigdata to learn more. 
Popular Tags:
25
Open Platform for Next-Gen Analytics VP Intel Architecture Group GM Datacenter Software Division @IntelITS Boyd Davis
Transcript
Page 1: Intel And Big Data: An Open Platform for Next-Gen Analytics

Open Platform for Next-Gen Analytics

VP Intel Architecture Group GM Datacenter Software Division

@IntelITS

Boyd Davis

Page 2: Intel And Big Data: An Open Platform for Next-Gen Analytics

Today’s presentations contain forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. Please refer to our most recent Earnings Release and our most recent Form 10-Q or 10-K filing for more information on the risk factors that could cause actual results to differ. If we use any non-GAAP financial measures during the presentations, you will find on our website, intc.com, the required reconciliation to the most directly comparable GAAP financial measure. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel's current plan of record product roadmaps. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Legal Information

Page 3: Intel And Big Data: An Open Platform for Next-Gen Analytics

Making sense of one petabyte

50x To read

in Library of Congress

13y To view

as HD Video

11s To generate

in 2012

http://blogs.loc.gov/digitalpreservation/2011/07/transferring-libraries-of-congress-of-data/

Page 4: Intel And Big Data: An Open Platform for Next-Gen Analytics

Analysis of data can transform society

Enhance scientific understanding, drive innovation, and accelerate medical cures

Create new business models and improve organizational processes

Increase public safety and improve energy efficiency with smart grids

Page 5: Intel And Big Data: An Open Platform for Next-Gen Analytics

Virtuous cycle of data-driven user experience

CLOUD

Richer data to analyze CLIENTS

Richer data from devices

Richer user experiences

INTELLIGENT SYSTEMS

Page 6: Intel And Big Data: An Open Platform for Next-Gen Analytics

Intel at the intersection of forces behind big data

Enabling exascale computing on massive data sets

Helping enterprises build open interoperable clouds

Contributing code and fostering ecosystem

HPC Cloud Open Source

Intel® TrueScale Infiniband

* Other names and brands may be claimed as the property of others.

Page 7: Intel And Big Data: An Open Platform for Next-Gen Analytics

Democratize data analysis from edge to cloud

Unlock value in silicon Support open platforms

Deliver software value

Page 8: Intel And Big Data: An Open Platform for Next-Gen Analytics

Research

Benchmarking

Tuning Optimization

Product

History of Intel and Apache Hadoop*

2009 2013

Open Cirrus*

HiBench Release 1.0

(2011)

* Other names and brands may be claimed as the property of others.

Release 2.0 (2012) Telco Smart City

Web

Retail Healthcare

Page 9: Intel And Big Data: An Open Platform for Next-Gen Analytics

Announcing availability of Intel® Distribution for Apache Hadoop* software

Hardware-enhanced performance & security

Enables partner innovation in analytics Strengthens Apache Hadoop* ecosystem

* Other names and brands may be claimed as the property of others.

Page 10: Intel And Big Data: An Open Platform for Next-Gen Analytics

Intel® Distribution for Apache Hadoop* software

All external names and brands are claimed as the property of others.

   

Intel® Manager for Apache Hadoop software Deployment, Configuration, Monitoring, Alerts, and Security

HDFS 2.0.3 Hadoop Distributed File System

YARN (MRv2) Distributed Processing Framework

HB

ase

0.9

4.1

Co

lum

nar

Stor

e

Zook

eepe

r 3

.4.5

Co

ordi

natio

n

Flum

e 1

.3.0

Lo

g Co

llect

or

Sqoo

p 1

.4.1

D

ata

Exch

ange

Pig 0.9.2

Scripting Hive 0.9.0

SQL Query Oozie 3.3.0

Workflow Mahout 0.7 Machine Learning

R connectors Statistics

Intel enhancements contributed back to open source

Open source components included without change

Intel proprietary

Page 11: Intel And Big Data: An Open Platform for Next-Gen Analytics

Intel® Distribution for Apache Hadoop* software

•  Up to 20x faster decryption with AES-NI*

•  Optimized with SSD and Cache Acceleration

•  Up to 8.5X faster queries in Hive*

•  Hardware-enhanced compression with AVX & SSE4.2

•  Automated tuning with Intel® Active Tuner

*Based on internal testing

Page 12: Intel And Big Data: An Open Platform for Next-Gen Analytics

Sold with World-Class Intel Support

Annual Subscription with Technical Support Support Coverage Options: 24x7 or 8x5 Via Solution Vendors and Service Providers

Page 13: Intel And Big Data: An Open Platform for Next-Gen Analytics

Backed by broad portfolio of datacenter products

Software

Network Storage & Memory Server

Cache Acceleration Software

Page 14: Intel And Big Data: An Open Platform for Next-Gen Analytics

* Other names and brands may be claimed as the property of others.

Paul Perez Vice President and GM

Data Center Group

Page 15: Intel And Big Data: An Open Platform for Next-Gen Analytics

Intel portfolio delivers balanced performance

Intel® Xeon 5690

7200 HDD

1GbE Adapter

~7 minutes

>4 hours

Intel® Xeon® E5-2690 processor

~50% improved Intel® SSD

520 Series

~80% improved

Intel® 10GbE Adapters

~50% improved

Intel® Distribution for Apache Hadoop*

software

~40% improved

Other brands and names are the property of their respective owners

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal testing For more information go to : intel.com/performance `

Shown to improve 1 Terabyte sort from 4 hours to 7 minutes

Page 16: Intel And Big Data: An Open Platform for Next-Gen Analytics

Proven in the enterprise

Using the Intel® Distribution to gain tremendous results

* Other names and brands may be claimed as the property of others.

IT

Page 17: Intel And Big Data: An Open Platform for Next-Gen Analytics

* Other names and brands may be claimed as the property of others.

Satnam Alag Vice President and CTO

Page 18: Intel And Big Data: An Open Platform for Next-Gen Analytics

Delivering innovation in the open

Pipeline of innovation from Intel Labs

•  Machine Learning

•  Data-Intensive Algorithms & Computer Architecture

Roadmap of open source from Intel Software •  Project Panthera: Standard SQL on Apache Hadoop

•  Project Rhino: Hardening Apache Hadoop

Page 19: Intel And Big Data: An Open Platform for Next-Gen Analytics

Lighting up unused data for big impact

2013 2014 2015 2016 2017

Intel accelerating adoption of Hadoop +

Apache Hadoop landing on Intel Xeon

2 years faster

Intel® Xeon processor growth from big data use

Uni

ts

Page 20: Intel And Big Data: An Open Platform for Next-Gen Analytics

With broad support from the ecosystem

* Other names and brands may be claimed as the property of others.

Page 21: Intel And Big Data: An Open Platform for Next-Gen Analytics

Enabling partner innovation in next-gen analytics

Paul Perez, Vice President and GM Data Center Group

Richard Pledereder, Senior Vice President SAP® HANA* Engineering

Steve Garrou, Vice President Global Solutions

Ranga Rangachari, Vice President and GM Storage Business

Page 22: Intel And Big Data: An Open Platform for Next-Gen Analytics

Summary

•  Intel announced Intel® Distribution for Apache Hadoop* software

•  Delivers hardware-enhanced capabilities and software enhancements

•  Backed by broad portfolio of Intel data center products

•  Contributes to open source and supports Apache Hadoop

•  Enabling ecosystem of partners to innovate on analytics solutions

Page 23: Intel And Big Data: An Open Platform for Next-Gen Analytics

Q&A

Page 24: Intel And Big Data: An Open Platform for Next-Gen Analytics

Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel processor numbers are not a measure of performance.  Processor numbers differentiate features within each processor family, not across different processor families.  Go to: http://www.intel.com/products/processor_number

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM).  Functionality, performance or other benefits will vary depending on hardware and software configurations.  Software applications may not be compatible with all operating systems.  Consult your PC manufacturer.  For more information, visit http://www.intel.com/go/virtualization

No computer system can provide absolute security under all conditions.  Intel® Trusted Execution Technology (Intel® TXT) requires a computer system with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible measured launched environment (MLE).  Intel TXT also requires the system to contain a TPM v1.s.  For more information, visit http://www.intel.com/technology/security

Intel, Intel Xeon, Intel Atom, Intel Xeon Phi, Intel Itanium, the Intel Itanium logo, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Other names and brands may be claimed as the property of others.

Copyright © 2013, Intel Corporation. All rights reserved.

Page 25: Intel And Big Data: An Open Platform for Next-Gen Analytics

Apache Hadoop Performance Test Configuration 4 hours to 7 minutes

Cluster Configuration q  1 Head Node (name node, job tracker) q  10 Workers (data nodes, task trackers) q  10-Gigabit Switch: Cisco Nexus 5020 Software Configuration q  Intel Distribution for Apache Hadoop 2.1.1 q  Apache Hadoop 1.0.3 q  RHEL 6.3 q  Oracle Java 1.7.0_05

Head Node Hardware q  1 x Dell r710 1U servers

§  Intel: 2x3.47GHz Intel® Xeon® processor X5690

§  Memory: 48G RAM §  Storage: 10K SAS HDD §  Intel® Ethernet 10 Gigabit SFP+ §  Intel® Ethernet 1 Gigabit

Worker Node Hardware 10 x Dell r720 2U servers

§  Intel: 2 x 2.90Ghz Intel® Xeon® processor E5-2690 §  Memory: 128G RAM §  Storage: 520 Series SSDs §  Intel® Ethernet 10 Gigabit SFP+ §  Intel® Ethernet 1 Gigabit

Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Note: The below disclaimer should be included whenever the general performance disclaimer is used, but should be numbered separately: Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance


Recommended