Date post: | 14-Jan-2015 |
Category: |
Technology |
Upload: | tony-pearson |
View: | 2,187 times |
Download: | 6 times |
© 2014 IBM Corporation
Backup Options IBM PureData™ System for Analytics, powered by Netezza
Tony Pearson – IBM Master Inventor and Senior IT Specialist
March 2014
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
2
Part of the IBM Big Data PlatformWorkload Optimized Solutions for All Your Analytic Needs
Analytics & Decision Management
Solutions
Big Data Infrastructure
IBM Big Data Platform
Accelerators
Information Integration & Governance
Visualization& Discovery
Application Development
Systems Management
Stream Computing
HadoopSystem
Data Warehouse
PureDataSystem for Analytics
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
33
Spend Less Time Managing and More Time Innovating
� No dbspace/tablespace sizing and configuration
� No redo/physical/Logical log sizing and configuration
� No page/block sizing and configuration for tables
� No extent sizing and configuration for tables
� No Temp space allocation and monitoring
� No RAID level decisions for dbspaces
� No logical volume creations of files
� No integration of OS kernel recommendations
� No maintenance of OS recommended patch levels
� No JAD sessions to configure host/network/storage
� No dbspace/tablespace sizing and configuration
� No redo/physical/Logical log sizing and configuration
� No page/block sizing and configuration for tables
� No extent sizing and configuration for tables
� No Temp space allocation and monitoring
� No RAID level decisions for dbspaces
� No logical volume creations of files
� No integration of OS kernel recommendations
� No maintenance of OS recommended patch levels
� No JAD sessions to configure host/network/storage
Data Experts, not
Database Experts
� Easy Administration Portal
� No software installation
� No indexes and tuning
� No storage administration
IBM’s Advantage--FPGA
� A Real-Time silicon
SQL accelerator
� Dynamically
reprogrammed for each
individual query.
� Eradicates ~95% of
system I/O before the
CPU ever sees it.
� Completely unique to
PDA.
Simplicity and
Ease of
Administration
Simplicity and
Ease of
Administration
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
4
PureData System for Analytics Hardware Overview: Model N200x
� User Data Capacity: 192 TB*� Data Scan Speed: 478 TB/hr*� Load Speed (per system): 5+ TB/hr
� Active Data Slices: 96� Power Requirements: 7.5 kW� Cooling Requirements: 27,000 BTU/hr
* Assuming 4X compression
Scales from 1/4 Rack to 4
Racks
2 Hosts (Active-Passive)� 2 Intel 2.7 GHz Sandy Bridge CPUs� 7x300 GB SAS Drives� Red Hat Linux 6 64-bit
7 PureData for Analytics S-Blades™� 2 Intel 8 Core 2+ GHz CPUs� 2 8-Engine Xilinx Virtex-6 FPGAs� 128 GB RAM + 8 GB slice buffer� Linux 64-bit Kernel
12 Disk Enclosures� 288 600 GB SAS2 Drives
• 240 for User Data• 14 for S-Blades• 34 Spare
� RAID 1 Mirroring
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
5
IBM PureData for Analytics – Reasons for Backup
� IBM will take care of Red Hat Enterprise Linux,
Web Admin and other code as needed
–No need for you to back it up yourself
Firmware
• Linux
• Code
Metadata
• Host Catalog
• Global users, groups, permissions
User Data
Database 1
• Table A
• Table B
Database 2
• Table X
• Table Y
• Table Z
� Backup this to protect host
configuration from data
corruption (rare)
� Various reasons to backup database schema
and contents
–As part of firmware upgrade/downgrade
–To transfer data to another system
–Protect against hardware failure / disaster
–Protect against data corruption
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
6
Compressed versus Text-format
Firmware + 1
User Data
Database 1
• Table A
• Table B
Database 2
• Table X
• Table Y
• Table Z
Firmware
Firmware -1
Firmware
Compressed
database backup
Compressed
external tables
Text-format
external table
Other
Database
systems
Upgrade
Downgrade
Restore to same
or higher firmware
Restore to any,
but slower, takes
up more space
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
7
Two Primary Approaches
1. Filesystem Approach
� Backup metadata and databases to
external NAS storage devices
� Built-in CLI commands included
� Scripts for large databases available
2. External Backup Software
� Backup metadata and databases to
external backup server/media
� User-initiated and Automatic scheduled
backups
� Supports disk, tape and virtual tape
storage devices
Metadata
• Host Catalog
• Global users,
groups,
permissions
User Data
Database 1
• Table A
• Table B
Database 2
• Table X
• Table Y
• Table Z
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
8
Network Configuration using SAN or LAN as Backup Network
Metadata
• Host Catalog
• Global users,
groups,
permissions
User Data
Database 1
• Table A
• Table B
Database 2
• Table X
• Table Y
• Table Z
User
Network
• nzhostbackup
• nzbackup -users
• nzbackup –db• (up to 16
multiple streams)
• CREATE
EXTERNAL
TABLE
• nz_backup script
for larger databases
External storage
device
Backup
Network
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
9
Proof-of-Concept (PoC) Configuration
� Storwize V7000 Unified comprising
–Two file modules (2073-700)
–One V7000 control enclosure (2076-324)
–Code level 1.4.0.1
� File modules connected via 4 x 10 Gbit interfaces
� 24 x 600 GB 10K SAS drives installed in V7000 control enclosure
� Test database:
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
10
Test Conclusion / Best Practices
4 NSD 8 NSD 20
NSD
2 NSD 10
NSD
3 NSD 6 NSD 8 NSD
4 x RAID-5 4+P 2 x RAID-5 8+P 3 x RAID-10 4+4
0
50
100
150
200
250
300
350
400
450
500
MB
/ s
ec
* ~1.7 TB/h compressed data
� Matching the GPFS block size
to RAID full stripe width is
beneficial
� Matching the number of NSDs
to number of RAIDs is beneficial
� When matching number of
NSDs to number of RAIDs,
usage of sequential NSDs is
beneficial
� Small RAID-5 arrays (4+P) with
the matching number of NSDs
and mdisks (RAIDs) and 2
mount points shows best
performance (multiple streams)
� Supports both
nzbackup/nzrestore CLI and
nz_backup/nz_restore scripts
6+ TB/h
uncompressed data *
�Focusing on backup performance
– Run multiple backup streams
�Focusing on restore performance
– Run single backup stream
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
11
Two Primary Approaches
1. Filesystem Approach
� Backup metadata and databases to
external NAS storage devices
� Built-in CLI commands included
� Scripts for large databases available
2. External Backup Software
� Backup metadata and databases to
external backup server/media
� User-initiated and Automatic scheduled
backups
� Supports disk, tape and virtual tape
storage devices
Metadata
• Host Catalog
• Global users,
groups,
permissions
User Data
Database 1
• Table A
• Table B
Database 2
• Table X
• Table Y
• Table Z
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
12
Network Configuration
Metadata
• Host Catalog
• Global users,
groups,
permissions
User Data
Database 1
• Table A
• Table B
Database 2
• Table X
• Table Y
• Table Z
User
Network
Backup
Network
• nzhostbackup to local file
• transfer to backup server
• Nzbackup –users
• nzbackup –db• (up to 1000
multiple streams)
• Specify
–connector –connectorArgs
• Create scripts for
automatic schedule
External Backup
server
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
13
External Backup Architecture
Client
code
Backup Server
Master
Catalog
Media
Management
SAN
Storage Hierarchy
•Disk
•Physical Tape
•Virtual Tape
IBM Tivoli Storage Manager (TSM) server
LAN
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
14
External Backup Architecture – TSM Proxy Node
Proxy
node
Backup Server
Master
Catalog
Media
Management
SAN
Storage Hierarchy
•Physical Tape
•Virtual Tape
LAN
Proxy node
• Sends data directly to
physical or virtual tape
over SAN fabric
• Registers copies with
Master Catalog
• Can support multiple
PureData for Analytics
systems
TSM client code sends
backup to Proxy node
TSM server manages
media, tape reclamation,
backup copy pools, etc.XBSA
code
LAN Free
Storage
agent
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
15
External Backup Architecture – TSM LAN Free
XBSA
code
Backup Server
Master
Catalog
Media
Management
SAN
Storage Hierarchy
•Physical Tape
•Virtual Tape
LAN
TSM client code sends
backups directly to
physical or virtual tape
over SAN fabric
TSM client code registers
backup copies with Master
Catalog
TSM server manages
media, tape reclamation,
backup copy pools, etc.
LAN Free
• Avoids congestion traffic
on LAN by using SAN
directly
• Will consume more CPU
resources on PureData
for Analytics system
LAN Free
Storage
agent
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
16
Summary
1.Use Filesystem Method with SAN or
NAS storage device such as
Storwize V7000 Unified
2.Use IBM Tivoli Storage Manager
server infrastructure to backup
PureData for Analytics systems
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
17
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
18
About the Speaker
Mr. Tony Pearson
Master Inventor,
Senior Managing Consultant
IBM System Storage
Tony Pearson is a Master Inventor and Senior IT storage consultant for the IBM System Storage™ product line.
Tony Pearson joined IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. Over the past years, Tony has worked in
development, marketing and customer care positions for various storage hardware and software products.
In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, as well as various storage software
products. He interacts with clients, speaks at conferences and events, and leads workshops to help clients with strategic planning for IBM’s integrated
set of storage management software, hardware, and virtualization products.
Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog
was rated one of the top 10 blogs of 2006 for the IT storage industry by “Networking World” magazine. The blog was published in book form as Inside
System Storage: Volume I and Volume II , both available from Lulu publishing.
Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in Electrical Engineering, both from the University
of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products.
9000 S. Rita Road
Bldg 9070 Mail 9070
Tucson, AZ 85744
+1 520-799-4309 (Office)
Tony Pearson
Master Inventor,
Senior Managing
Consultant
IBM System Storage™
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
19
Additional Resources
19
Email:[email protected]
Twitter:http://twitter.com/az99Øtony
Blog: http://ibm.co/brAeZØ
Books:http://www.lulu.com/spotlight/99Ø_tony
IBM Expert Network:http://www.slideshare.net/az99Øtony
19
© 2014 IBM Corporation
IBM PureData for Analytics powered by Netezza – Backup Options
20
Trademarks and disclaimers© IBM Corporation 2011. All rights reserved.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.
Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.
The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.
Photographs shown may be engineering prototypes. Changes may be incorporated in production models.
Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.
ZSP03490-USEN-00