Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | dell-enterprise |
View: | 292 times |
Download: | 0 times |
42 2013 Issue 03 | Dell.com/powersolutions
Business intelligence
Reprinted from Dell Power Solutions, 2013 Issue 3. Copyright © 2013 Dell Inc. All rights reserved. Reprinted from Dell Power Solutions, 2013 Issue 3. Copyright © 2013 Dell Inc. All rights reserved.
Once a little-known technology,
the Apache™ Hadoop® software
framework was developed to
support offline analytics. It has since
evolved into a powerful platform for managing
and processing the vast amounts of data deluging
enterprise systems.
From megabytes to yottabytes, information
repositories are growing by the nanosecond
through an influx of unstructured data from social
networking sites, video, images, mobile devices,
sensors and other sources. To gain insights from
massive amounts of diverse data types, many
organizations are looking beyond the restricted
capacity and capabilities of standard relational
database management systems (RDBMSs).
The same architectural design of RDBMSs that
helps ensure consistency and availability often
results in scalability limitations. Also, the use of
proprietary RDBMS extensions optimizes database
performance but subjects organizations to vendor
lock-in. And organizations may experience costly
per-processing licenses for commercial RDBMSs.
As a result, the demand for innovative, cost-
effective big data offerings is intensifying. The
global big data technology and services market is
projected to expand at a 31.7 percent compound
annual growth rate through 2016 — about seven
Unlocking insights from vast data volumes requires a scalable system that quickly processes
both unstructured and structured data. The Intel® Distribution for Apache Hadoop provides
enhancements that boost performance while streamlining deployment.
By Armando Acosta and Maggie Smith
Optimizing performance for big data analysis
Dell.com/powersolutions | 2013 Issue 03 43
Business intelligence
Reprinted from Dell Power Solutions, 2013 Issue 3. Copyright © 2013 Dell Inc. All rights reserved. Reprinted from Dell Power Solutions, 2013 Issue 3. Copyright © 2013 Dell Inc. All rights reserved.
times greater than that of the information and
communication technology market.1
Deriving actionable insights from huge data
volumes calls for a system that can process
multistructured data volumes rapidly and scale
easily to accommodate growth in a stable
and secure IT environment. The open-source
Hadoop platform shows enormous promise
for big data management and processing in a
number of scenarios, ranging from mining social
media profiles and flagging credit card fraud to
identifying top job candidates and predicting
weather patterns.
Yet for all Hadoop’s data-crunching prowess,
an absence of integrated support for strong data
security has slowed deployment efforts. Consider,
for example, a financial institution that combines
multiple data warehouses into a large Hadoop
cluster. Securing the data requires extensive use
of embedded encryption tools. However, many
Hadoop implementations are not optimized
to handle the processing load incurred by
encryption and decryption, which typically add
considerable latency and consume substantial
compute resources.
To address organizational needs to run high-
performance analytics on a secure platform,
Dell has teamed up with Intel to optimize the
Intel Distribution for Apache Hadoop software
for deployment on Dell™ hardware. The Intel
Distribution is designed to provide secure
enterprise-quality distributed-processing and data-
management software, as well as deployment
support and consulting services.
Finding the right fit
Because of the wide variety of big data
challenges, organizations require broadened
flexibility and choice in a platform that helps them
gain valuable insights based on their specific use
cases. (For more information, see the sidebar,
“Distributed processing in action.”) When it comes
to big data management and analytics, one size
does not fit all.
To that end, Dell has expanded its Hadoop
offerings to include the Intel Distribution for
Apache Hadoop. The Intel Distribution joins the
field-tested Dell | Cloudera Hadoop Solution,
which combines Cloudera’s Distribution Including
Apache Hadoop (CDH) with Dell servers, Dell-
developed Crowbar deployment software and
networking components, as well as management
tools, training, technology support and
professional services. (For more information, see
the sidebar, “Insight acceleration.”)
Enhancing performance, security
and manageability
The Intel Distribution is packaged with the
Hadoop platform and other software components
(see figure). Hadoop comprises the Hadoop
Distributed File System (HDFS™) framework,
designed for high-throughput data storage
and access on commodity hardware, and
the MapReduce framework, which enables
developers to write applications that execute
jobs in parallel on large clusters. Other core
components of Hadoop are the Apache Hive™
data warehousing software and the
Apache HBase™ database, a distributed,
columnar big data store.
With the power of Hadoop at its foundation,
the Intel Distribution features a number of
additional capabilities and optimizations
designed to streamline deployment and improve
1 IDC Worldwide Big Data Technology and Services 2012-2016 Forecast, doc #238746, December 2012.
Intel Manager for Apache Hadoop SoftwareDeployment, configuration, monitoring, altering and security
Ap
ach
e Sq
oo
p™
dat
a e
xch
ang
e
Ap
ach
e Fl
um
e™
log
co
llec
tor
A
pac
he
Zo
oke
eper
™
co
ord
inat
ion
Ap
ach
e H
Bas
ec
olu
mn
ar s
tora
ge Apache Pig™
scriptingApache HiveSQL-like query
Apache Oozie™
workflow
MapReducedistributed processing framework
Apache HDFSHadoop Distributed File System
Taxonomy of the Intel Distribution for Apache Hadoop
44 2013 Issue 03 | Dell.com/powersolutions
Business intelligence
Reprinted from Dell Power Solutions, 2013 Issue 3. Copyright © 2013 Dell Inc. All rights reserved. Reprinted from Dell Power Solutions, 2013 Issue 3. Copyright © 2013 Dell Inc. All rights reserved.
performance. Intel® Manager for Apache
Hadoop is a web-based management
console that facilitates the installation,
configuration and administration of
the Hadoop cluster. Intel Manager also
supports resource monitoring and alerting
through the open-source Nagios® and
Ganglia monitoring systems, which are
included in the Intel Distribution. By taking
advantage of this powerful, easy-to-use
tool, IT can focus critical resources and
expertise on deriving business value from
the Hadoop environment rather than
managing the cluster.
The Intel Distribution includes extensions
to HBase and Hive that help improve
real-time transactional performance and
the end-user experience. Exceptional
encryption and decryption capabilities
heighten security and access control.
The Intel Distribution is optimized to
work with Intel® Advanced Encryption
Standard New Instructions (Intel® AES-NI)
technology, which is built into Intel® Xeon®
processors. Intel AES-NI is designed to
accelerate compute-intensive encryption and
decryption, helping eliminate latency and
greatly reduce processor load.
In addition to leveraging the capabilities
of its processors, Intel can build and
optimize hardware features of the
company’s solid-state drives (SSDs) and
10 Gigabit Ethernet (10GbE) adapters to
boost Hadoop performance, security
and manageability.
Also critical to accelerating Hadoop
performance is server optimization. The
Intel Distribution is designed to efficiently
integrate Hadoop with Dell servers to
deliver optimal solutions for a variety
of use cases. The Dell PowerEdge™
R720xd server is well suited for Hadoop
deployments because these environments
often require a 1:1 spindle-to-core ratio for
optimized performance. The PowerEdge
R720xd features high spindle-to-core
counts and includes options to avoid
I/O bottlenecks.
Insight acceleration Organizations worldwide are turning to the open-source Apache Hadoop software
platform to support enterprise applications that analyze extremely large amounts
of diverse data. However, the inherent nature of Hadoop, with its distributed
architecture, adds layers of complexity, especially when it comes to deployment,
management and security. As a result, many organizations may have delayed
Hadoop deployments because they lack the necessary expertise in planning,
design, implementation and maintenance.
By providing the expert assistance, tools and technology resources needed,
Dell Services helps organizations move their Hadoop activities from the sandbox to
production environments to achieve business value. These services are tailored to
an organization’s short- and/or long-term objectives and help optimize the use of
emerging technologies, advance efficiencies and maximize the value of IT investments.
Experts at Dell Solution Centers located in key sites around the globe are available
to bolster the technical skills of those new to Hadoop and open-source technologies.
They can help participants gain hands-on experience with a variety of topics, ranging
from obtaining maximum performance from an application deployed on Dell servers
and storage to exploring cloud computing and big data using Hadoop.
At a Dell Solution Center, participants can attend a technical briefing with a
Dell expert, investigate an architectural design session or build a proof-of-concept
engagement to comprehensively validate a big data solution and streamline
deployment. Using an organization’s specific configurations and test data, participants
can discover how a big data solution from Dell meets their business needs.
A recent addition to Dell’s global network of solution centers is the Big Data
Innovation Center in Singapore, where organizations can test big data initiatives
and proofs of concept. The facility provides a big data stack that includes Dell
infrastructure using Intel Xeon E5 processor–based servers, Intel® 10 Gigabit
Ethernet networking, Intel® Solid-State Drives, the Intel Distribution for Apache
Hadoop and Revolution R Enterprise predictive analytics software. Organizations
that need to test-run their big data workloads can use the center to determine the
impact of big data initiatives to their business. The center also offers training to
help equip participants with the skills necessary for improving the quality of data
mining across a wide range of platforms and data sources.
For more information on Dell Solution Centers, visit dell.com/solutioncenters.
Dell.com/powersolutions | 2013 Issue 03 45
Business intelligence
Reprinted from Dell Power Solutions, 2013 Issue 3. Copyright © 2013 Dell Inc. All rights reserved. Reprinted from Dell Power Solutions, 2013 Issue 3. Copyright © 2013 Dell Inc. All rights reserved.
Putting big data to work
Originally a tool for offline analytics of web-scale
data, Hadoop is fast on its way to becoming
a business-critical platform for gathering
intelligence and actionable insights from vast
amounts of unstructured data. Helping to drive
this transformation is the Intel Distribution for
Apache Hadoop — an open-source offering
that unites the power of Hadoop and other
software elements with important performance
enhancements and hardware optimizations from
Intel. Together, this combination of capabilities
not only enhances security, performance
and manageability, but also provides a robust
foundation for advancing innovation in analytics
by the open-source community.
Learn more
Intel Distribution for Apache Hadoop
on Dell PowerEdge Servers:
qrs.ly/br3gyd4
Authors
Armando Acosta is a senior product line consultant
at Dell and has more than 15 years of experience in
the IT industry.
Maggie Smith is a senior marketing manager at Dell.
She is focused on big data solutions for enterprises
and has over 30 years of experience marketing
technology products.
Distributed processing in action As big data becomes big business,
organizations are discovering
innovative ways to harness the value
of their data. The Intel Distribution
for Apache Hadoop helps these
organizations get the most out of
hardware performance, strengthen data
security and improve data management
and processing capabilities.
One company, for example, used
the Intel Distribution to support its
powerful search-engine technology
for life-science researchers. Dedicated
to furthering genomics research, the
company was having trouble managing
its large data sets. To scale in a cost-
effective manner, the company
deployed the Intel Distribution and
used Apache Hive and Apache Hadoop
for query and search. The company
also turned to Intel to optimize its
hardware and software for increased
performance. As a result, the company
achieved an exceptional increase in
throughput using less than half the
nodes previously deployed.
Another example is a large
telecommunications company that was
faced with eroding profits thanks in part
to the high cost of maintaining a complex
billing system. Poor-quality customer
service stemming from the beleaguered
billing system was prompting customer
churn. Unfortunately, the company’s
existing relational database management
system (RDBMS) could not deliver
storage scalability or real-time query
access. So the telecommunications
company selected the Intel Distribution
for real-time analytics and decision
support, as well as solid disaster
recovery and failover. The result:
exceptional support for a new business
intelligence initiative that provided a
lower total cost of ownership compared
to its traditional RDBMS.
iSto
ckp
ho
to/T
hin
ksto
ck