+ All Categories
Home > Documents > Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE...

Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE...

Date post: 16-Aug-2018
Category:
Upload: dangdiep
View: 235 times
Download: 1 times
Share this document with a friend
14
Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017
Transcript
Page 1: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

Opendedupe & Veritas NetBackupARCHITECTURE OVERVIEW AND USE CASES

May, 2017

Page 2: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Contents

Introduction ...................................................................................................................................................................... 2

Overview ...................................................................................................................................................................................................... 2

Architecture ................................................................................................................................................................................................. 2

SDFS File System Service .................................................................................................................................................................. 3

Data Writes ............................................................................................................................................................................................ 3

Data Reads ............................................................................................................................................................................................. 3

De-duplication Storage Engine ........................................................................................................................................................ 4

Data Blocks ........................................................................................................................................................................................... 4

Cloud Storage of Data Blocks ......................................................................................................................................................... 4

Reference configurations ......................................................................................................................................................................... 4

Standalone Opendedupe system .................................................................................................................................................... 4

Customer Pain Points/Business Challenges ................................................................................................................... 5

Applications ................................................................................................................................................................................................. 6

Case 1: Backup and Replication ...................................................................................................................................................... 6

Case 2: Tape Elimination .................................................................................................................................................................... 7

Case 3: Migrate Data from EMC Data Domain to alternate Object Store ........................................................................... 8

Case 4: Long term archival to lower storage costs .................................................................................................................... 9

Case 5: Zero Data Movement, A.I.R based Cloud DR recovery of on-premise servers ................................................10

Case 7: NetBackup in the cloud ....................................................................................................................................................12

Conclusion ...................................................................................................................................................................... 13

Page 3: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

2 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Introduction

Opendedupe represents an opportunity for enterprises to optimize storage usage, and protect large amounts of data. As an open source product, there is no capital cost to the customer, but those who do acquire Opendedupe have an option to purchase enterprise grade 24/7 support. Opendedupe is an excellent fit for backup, and long-term archival retention, either on-premise, off-premise (in the cloud) or a combination of the two. It can be used to store data either locally on-site, or off-site in the cloud, while using the smallest amount of actual storage possible.

OVERVIEW

Opendedupe originated in 2010, and is comprised of both the SDFS filesystem and SDVOL volume management. SDFS performs inline de-duplication, provides expandability and flexibility to either local or cloud storage (typically using the standard S3 protocol used by the likes of Amazon, Google, Azure, etc.), while SDVOL is a distributed and expandable volume manager that provides inline de-duplication and replication to any filesystem.

The two components, SDFS and SDVOL, can be deployed in either a standalone or distributed, multi-node configuration. In a standalone configuration, inline de-duplication, replication and unlimited snapshot capabilities are available. Multi-node deployments gain global, intra-volume de-duplication, and configurable block storage redundancy with block storage expandability. A unique and powerful feature of Opendedupe is that it stores a copy of its hash and metadata lookup information with the de-duplicated data in the cloud or local storage.

ARCHITECTURE

SDFS’s design decouples block data from file metadata, allowing any number of logical files to reference the same unique data block. The data block has no knowledge of what files reference it, or where the files are located. The metadata contains a hash that is associated with the logical location within a file. As the data is de-duplicated and shared between volumes, the I/O is significantly reduced across both the network and on the system.

SDFS has 3 basic components:

• SDFS File System Service (FSS)• De-duplication Storage Engine (DSE)• Data Chunks

Page 4: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

3 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Figure 1 - Read/Write flow from application layer to local or S3 bucket storage

SDFS File System ServiceThe SDFS file-system provides a POSIX compliant view of the de-duplicated data, and is a logical container responsible for all filesystem level activity (i.e.: chunking data into blocks for de-duplication, file system statistics, snapshots). The SDFS File System Service stores the metadata regarding files and folders, and the mapping file for the actual file data. The actual data blocks or chunks are either stored within the local De-duplication Storage Engine or, if multi-node, a node’s DSE.

SDFS logical files are represented by two different metadata components, and are held in two files. The first represents the filesystem namespace that is presented when the filesystem is mounted. This contains the filesystem attributes associated with the file (example: size, atime, ctime, acls,) and links to the associated map file. The second metadata file is the mapping file, which contains the list of records corresponding to the locations of the data blocks that represent the file. Each record contains a hash entry, if the data is duplicate, stored on remote nodes, and what nodes the data can be found on.

Data WritesA data write to the SDFS filesystem is first sent to the File-System Process for the kernel via the FUSE library. SDFS grabs the data from the FUSE layer API, breaks the data into fixed chunks, and is immediately cached on a per file basis for active I/O in a FIFO buffer. The chunks are expired from the FIFO buffer as new data enters or after two seconds. The expired chunks are moved to a flushing buffer, which in turn is emptied by several configurable write threads that compute the hash for the data block, then a search is done to see if the hash or data has already been stored, confirming the data has been stored and updating the record associated with the data block in the mapping file.

Data ReadsWhen a read request is made for data on the SDFS filesystem, the file position and data length is passed via the FUSE layer to the SDFS application. The record(s) associated with the file’s position and length are looked up in the mapping file and the relevant block data is recalled by looking up the hash for local storage.

Page 5: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

4 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

De-duplication Storage EngineThe DSE stores, retrieves and removes all deduped data chunks. The DSE can be run as part of an SDFS volume (default) or as part of a global de-duplication storage pool. The chunks of data are stored either on local disk or at a cloud provider and are indexed for retrieval with a custom written hash table.

Data Blocks Unique data chunks are stored in a Data Block by the DSE either on disk or in the cloud. The data chunks are stored in sequence within the chunkstore folder. The datablocks default to 40 MB in size but can be set as high as 2 GB. Once a block has filled, or times out waiting for new data (six seconds), it is closed and isn’t available for writing. New blocks are created as unique data is written into the DSE, and is given a unique long integer identifier and is either compressed or encrypted in the chunkstore. The associated DSE map file is updated, and the data is written to permanent location on disk, or uploaded to the cloud and cached locally.

Cloud Storage of Data Blocks Data that is uploaded to the cloud is cached locally, but should the data flush from local cache, the data block is retrieved from the cloud storage provider, and re-cached locally. The data is then read back to the requesting application from the local cache. Should the data be stored in an Amazon Glacier repository, the volume is informed the data is archived, and the volume initiates an archive retrieval process.

REFERENCE CONFIGURATIONS

Standalone Opendedupe systemThe configuration below is designed to manage 100TB of backend stored (de-duplicated data) and assumes physical resources with an 8:1 de-duplication rate.

• 36 GB of RAM• 16 Core CPU @ 2.3 GHZ or better.• 103 TB storage (actual data, operating system, hash table and meta data requirements.• If using cloud storage, a minimum link speed of 5 MB/s is required, and faster may be required.

Page 6: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

5 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Figure 2 - Single Veritas NetBackup domain, on-premise with cloud replication using DPOO or capacity licensing

Customer Pain Points/Business Challenges

Businesses today are faced with multiple reasons for storing and archiving large amounts of unstructured data. This adds a burgeoning cost today to operational activity and costs. De-duplicating this data permits customers to reduce the electronic footprint, as well as the physical footprint of their back end storage. Using the 8:1 de-duplication ratio specified in the reference configurations, Opendedupe coupled with Veritas NetBackup can reduce customers’ on-premise backup storage needs significantly, and extend the storage density of existing storage. This will mean longer term images can be retained on-premise, and with the ability to use cloud S3 based storage, longer archival data can be moved off-site, with a commensurate reduction in media handling fees and physical system costs. Because Opendedupe also stores a copy of the metadata needed to access the data chunks with the data sent to the cloud, a customer is also able to survive a catastrophic event at the source site by simply deploying a new Opendedupe server, and configuring it to access the appropriate S3 bucket.

S3 Object Store in the Cloud

100 TB RAIDS

32 GB RAM16 Core CPU

2.3 GHZ

OS Hash Meta

3 x 1 TB Mirrored volumes

De-duplicated Data Pool

NetBackup Media server with Opendedupe OST Plugin

Page 7: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

6 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

APPLICATIONS

The following are some high-level use cases of Opendedupe with Veritas NetBackup presented for consideration:

Case 1: Backup and Replication

Consider a customer with Veritas NetBackup 7.7.3. Due to business requirements, they cannot upgrade to NetBackup 8.1 or later for a protracted period, but require a way to store data in the cloud. The Opendedupe OST plugin (on the NB HCL) will allow a customer to replicate images from their disk solution (basic, advanced or MSDP) to any S3 based cloud solution of their choice, on-site or off -site. Should the cloud solution be an on-premise version, the customer can protect the data for disaster recovery (DR) purposes by using Opendedupe’s built in, de-duplication replication features to create a synchronous or asynchronous copy of the data at an alternate location.

OpenDedup

OpenDedup Server

On Premise Cloud

Public Cloud Provider

NB Master/Media Server

p p

Data replicated to both destinations

Page 8: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

7 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Case 2: Tape Elimination

Consider a customer with a heavily tape based NetBackup solution, who wants to move to a disk based solution combined with the cloud. Opendedupe’s OST plugin combined with either Capacity or DPOO licensing, can move the customer efficiently to a cloud based archival storage of data with a resulting reduction in data center footprint as there is no longer a need for tape libraries. The benefit is that the data will be protected by the cloud provider’s data protection solutions, and is no longer subject to the physical damage that is inherent with the use of tape.

Opendedupe

Opendedupe ServerNB Master/Media Server

Page 9: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

8 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Case 3: Migrate Data from EMC Data Domain to alternate Object Store

EMC Data Domain (DD) storage, while eff ective, is costly for customers. Given the way Opendedupe’s OST plugin works, the data can be duplicated effi ciently from the DD devices to an Opendedupe solution and written to any lower cost S3 compatible Object storage – either on-site or off -site.

OpenDedup

OpenDedup Server

Private Cloud

DataDomain

NB Master/Media Server

Data Domain data migrated to OpenDedup

Page 10: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

9 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Case 4: Long term archival to lower storage costs

With the advent of cloud storage services, Opendedupe is a good fit as it provides reliable, low cost storage of long-term data retention for data compliancy requirements in a de-duplicated and encrypted (in-flight and at rest) manner. The customer does not need costly maintenance and infrastructure support costs for tape libraries and is better able to capitalize on the resulting free-up of physical infrastructure resources, and operational time spent on the care and feeding of the tape library.

OpenDedup

OpenDedup ServerNB Master/Media Server

Page 11: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

10 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Case 5: Zero Data Movement, A.I.R based Cloud DR recovery of on-premise servers

Replication with traditional storage targets require unique data synchronization between the primary and replica target. Historically this worked quite well but, as backup datasets become larger, backup target replication has come to consume a greater part of site to site bandwidth, potentially up to 70% of inter-site traffi c.

Opendedupe has the capability to replicate using this method and includes a Cloud Optimized method of replication to reduce replication traffi c to nearly zero.

With Opendedupe Zero Data Movement Feature, multiple NetBackup domains can share the same object storage bucket for reads and writes. This means that organizations can back up an image from one NetBackup domain and restore it from another simply by reading the data and metadata directly from the shared bucket (backup image import required).

This is eff ective for organizations that are using or desire to use cloud infrastructure for disaster recovery. Using Opendedupe, an organization can backup the on-premise servers to one cloud S3 storage provider, and then restore those servers to an alternate cloud S3 provider’s storage through a second NetBackup domain in the cloud, without any additional data movement.

Additionally, since the S3 object storage is co-located with the secondary cloud environment, faster restore speeds and recovery are possible.

S3 Connector

S3 Connector

OpenDedup

OpenDedup Server

Data Center 1

NB Master/Media Server

OpenDedup

OpenDedup Server

Data Center 2

NB Master/Media Server

S3 Data Bucket

rv

Data Center Backup image write

Data Center B

ackup im

age im

port

Page 12: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

11 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Case 6: Optimized replication between diff erent cloud storage targets

For DR and long term retention, backup data protection is paramount to prevent data corruption and ensure data availability in the event of a DR incident. Using traditional backup storage targets data consistency and availability is achieved through creating multiple optimized replicas of backup images across many data centers. This method can achieve 5 (99.999% uptime) nines of availability which is typically required for most DR plans.

When migrating to cloud storage as a target, it is important to keep in mind that even the best cloud infrastructure vendors do not guarantee or report 5 nines of availability for their object storage. For organizations that require high availability of their backup data, replicating backup data to two cloud storage vendors is critical.

Opendedupe includes optimized replication and Auto Image Replication (AIR) support that enables organizations to replicate backup images between two or more cloud storage vendors with reduced bandwidth and storage costs.

Like traditional backup storage targets, Opendedupe enables backup images to be replicated to secondary storage targets, only ever moving unique data across the wire. But, unlike traditional storage targets, Opendedupe enables organizations to migrate data automatically between cloud storage vendors, not just between on-premises storage targets.

This means organizations can replicate backup images between Microsoft Azure and Amazon AWS, or, on-premise object storage and off -premise object storage without any additional infrastructure.

OpenDedup

OpenDedup Server

Cloud Provider 1 Cloud Provider 2

NB Master/Media Server

p p

Data replicated to both destinations

Page 13: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

12 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Case 7: NetBackup in the cloud

MSDP and traditional de-duplicated backup storage targets rely on block storage as primary storage for backup images. On-premise can be cost eff ective, but when migrating to cloud infrastructure block storage, cost can be prohibitive for large backup environments.

As an example, block storage within Amazons EBS, at the time of writing this document, is $.10 a GB U.S. dollars. Depending on the IO requirements for block storage, this cost can be even higher. In contrast, S3 storage costs are $.023 per GB U.S. per month. Opendedupe leverages S3 storage for backup images and reduces cost to approximately 1/5 the cost of traditional backup targets.

OpenDedup

S3 Connecto

r

Virtual OpenDedup Server

S3 C

onne

ctor

OpenDedup

OpenDedup Server

Data Center 1

NB Master/Media Server

OpenDedup

OpenDedup Server

Data Center 2

NB Master/Media Server

S3 Data Bucket

Virtual system restored on cloud provider infrastructure

Virtual NB Master/Media

Server

VPN

VPN

Page 14: Opendedupe & Veritas NetBackup - kanatek.com · Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017

13 Opendedupe & Veritas NetBackup Whitepaper

E: [email protected] T: 1-800-526-2821

359 Terry Fox Drive, Suite 230, Kanata, Ontario K2K 2E7

kanatek.com@kanatek_inc

Kanatek Technologies

Conclusion

Opendedupe coupled with Veritas NetBackup addresses several business concerns and challenges; It permits global de-duplication of backup data across multiple NetBackup domains. Any NetBackup media server, including appliances, can access cloud storage solutions using the S3 protocol. Migration of data from data domain to lower cost object storage becomes possible and is optimized. Finally, Opendedupe eliminates the need for the care and feeding of tape libraries, storing data either on-premise or in the cloud, in a dense, de-duplicated state, with encryption.

Where cloud storage is used, site failure does not mean the loss of data, as it is possible to recall the hash table and other metadata from the cloud and recover data from the cloud.

Authors: David Kerrivan (Kanatek), Darryl Levesque (Kanatek), Sam Silverberg (Veritas).


Recommended