+ All Categories
Home > Technology > 040419 san forum

040419 san forum

Date post: 19-Jun-2015
Category:
Upload: thiru-raja
View: 135 times
Download: 2 times
Share this document with a friend
Popular Tags:
28
1 U.S. Department of the Interior U.S. Geological Survey A&T Advisory Board EDC Storage Area Network (SAN) April 19, 2004 Ken Gacke, Brian Sauer, Doug Jaton [email protected] [email protected] [email protected]
Transcript
  • 1. A&T Advisory Board EDC Storage Area Network (SAN) April 19, 2004 Ken Gacke, Brian Sauer, Doug Jaton [email_address] [email_address] [email_address]

2. Agenda

  • Storage Architecture
  • EDC SAN Architectures
    • Digital Reproduction SAN
    • Landsat SAN
    • LPDAAC SAN
  • SAN Reality Check

3. Storage Architecture Linux Sun SGI Direct Attached Storage Ethernet

  • Difficult to reallocate resources
  • File sharing via Network (NFS, FTP)
    • NFS Performance/Security Issues
    • Duplicate copies of data
    • I/O Performance/Bandwidth
  • Data Availability Concerns
    • Server failure => no data access

4. Storage Technology Linux Sun SGI Disk Farm SAN Configuration Ethernet Fibre Switch

  • Hardware Solution
    • Fibre Channel Switch
    • Fibre Channel RAID
  • Logical Reallocation of Resources
  • File sharing via Network (NFS, FTP)
    • NFS Performance/Security Issues
    • Duplicate copies of data
    • I/O Performance/Bandwidth
  • Data Availability Concerns
    • Server failure => no data access

5. Storage Technology Linux Sun SGI Clustered File System SAN Configuration Ethernet Fibre Switch

  • Hardware/Software Solution
    • Fibre Channel Switch
    • Fibre Channel RAID
    • Sharable File System
  • Logical Reallocation of Resources
  • Direct File Sharing
    • Single data copy
    • Efficient I/O
    • Scalable Bandwidth
  • High Data Availability

Shared File System CXFS/CFS CXFS/CFS CXFS/CFS 6. Storage Architecture

  • SAN Goals
    • File sharing across multiple servers
      • Heterogeneous Platform Support (IRIX, Solaris, Linux)
      • Reduce number of file copies
      • Improve I/O efficiency
        • Reduce I/O requirements on server
        • Reduce Network load
        • Reduce time required to transfer data
    • Storage Management
      • Increase disk storage utilization
      • Logical reallocation of storage resources
    • Data Availability
      • Maintain data access when a server fails

7. Digital Reproduction CR1 SAN April 19, 2004 Ken Gacke SAIC Contractor [email_address] 8. Historical Architecture No SAN Product Distribution Ethernet Architecture Notes: 1) Data transfer via FTP 2) Duplicate storage on both servers 3) Multiple data file I/O required on both servers 4) System bandwidth constrained by Network UniTree Server Tape Drives 8x9840 2x9940B 9. CR1 SAN Timeline

  • FY2002 DMF Integration
    • DMF Production Release in December 2001
      • Fully automated Data Migration process
      • 21TB migrated to DMF within 3 months
        • Data migration during off hours
        • Full data access through data migration period
  • FY2003 CXFS Integration
    • SGI CXFS Certified SAN Configuration
      • CXFS On Two IRIX Servers, DMF and PDS
      • SGI TP9400 1TB RAID
      • 8 Port Brocade and 16 Port Brocade fibre switches
    • SGI Installed on 10/8/02
      • Test DMF/CXFS configuration
      • Performed final CXFS testing
    • DMF/CXFS released to production on 11/5/02

10. CR1 SAN Architecture DMF Server Product Distribution Ethernet Tape Drives 8x9840 2x9940B 1Gb Fibre 2Gb Fibre Disk Cache /dmf/edc 68GB /dmf/doqq 547GB /dmf/guo 50GB /dmf/pds 223GB /dmf/pdsc 1100GB 11. CR1 SAN Architecture 12. CR1 SAN Summary

  • Data Storage
    • 2TB Disk Cache storing 67 Terabytes on the backend
    • 2.5 Million Files
  • 2003 AverageMonthlyData Throughput
    • Data ingest 3.5TB
    • Data retrieval 9.6TB
    • Average data throughput of 8.5MB/sec (includes tape access)
  • Minimal System/Ops Administration
  • Single Vendor Solution
    • SGI Software, RAID, and Fibre Switches
    • CXFS supported on SGI IRIX, Linux, Solaris, Windows, etc

13. Landsat SAN April 19, 2004 Brian Sauer SAIC Contractor [email_address] 14. Landsat SAN Goals

  • Improve Overall Performance (3 Hrs -> 1.5 Hrs)
  • Maximize Disk Storage Through Shared Resources
  • Centralized Management (System Admin, Hardware Eng)
  • Overcome Old SCSI RAID Obsolescence (Ciprico 6900)
  • Utilize Existing Investment in Fibre Channel Storage
    • Existing Investment in Ciprico NetArrays
    • Open Solution
  • High Performance
    • Combined throughput of over 240MB/sec
  • High Availability
  • Total Usable Storage over 10TB
  • SGI, Linux and SUN Clients
  • Integrate in Phases as Tasks Become SAN Ready

15. Landsat SAN Overview

  • 13 TB of Raw Storage Utilizing Ciprico NetArrays
  • Three Brocade Switches
  • Eleven Linux and Six SGI Clients
    • Data Capture System Database Server (DDS)
    • Landsat Processing System (LPS)
    • Landsat Archive Management System (LAM)
    • Image Assessment System (IAS)
    • Landsat Product Generation System (LPGS)
  • ADIC StorNext File System Software
    • Shared High Performance File System
  • Qlogic Fibre Channel Host Bus Adapters

16. Landsat OLD Data Flow L7L0RaArchive (LAM) L7RawCC Archive (LAM) R C C L7 ProcessingSystem (LPS) L 0 R a 85 Minutesto Process DCSDatabase Server ( DDS ) R C C R C C R C C Capture&TransferSystem (CTS) R C C R C C 24 Minute Transfer 14 MinutePass 24 Minute Transfer 20Minute Transfer 17. Landsat SAN Satellite dish SAN LGS CTS1 CTS2 CTS3 RAID3 RAID3 RAID3 DDS

  • Eliminated FTP Transfers

RAW DATA L0RA DATA LAM LPS 18. Landsat SAN Summary

  • Advantages
    • Able to share data in a high performance environment to reduce the amount of storage necessary
    • Increase in overall performance of the Landsat Ground System
    • Open Solution
      • Able to utilize existing equipment
      • Currently testing with other vendors
    • Disk availability for projects during off-peak times e.g. IAS
  • Disadvantages / Challenges
    • Challenge to integrate an open solution
      • CIPRICO RAID controller failures
    • Not good for real-time I/O
    • Challenge to integrate into multiple tasks
      • Own agenda and schedule
      • Individual requirements
      • Difficult to guarantee I/O

19. LP DAAC SAN Forum April 19, 2004 Douglas Jaton SAIC Contractor [email_address] 20. LP DAAC Data Pool Phase I SAN Goals

  • Phase I Data Pool Implementation in early FY03
  • Access/Distribution Method (ftp site):
  • Support increased electronic distribution
  • Reduce need to pull data from archive silos
  • Reduce need for order submissions (and media/shipping costs)
  • Give science and applications users timely, direct access to data, including machine access
  • Allow users to tailor their data views to more quickly locate the data they need by providing
  • The Data Pool SAN infrastructure effectively acts as a subset archive of the full ECS archive

21. LP DAAC Data Pool (SAN) Configuration

  • Data Pools are an additional subset inventory of science data (granule, browse, metadata) that reside in a separate inventory database, with their physical files resident on local storage area network (SAN = 44TB)
    • STK D178 RAID racks with 1 Sun E450 metadata server.
    • Data Pool inventory is managed via 2 ndSybase Inventory database
  • Data pool contents are populated from the primary ECS archive.
    • Subscriptions can be fully qualified with the population occurring at insert time in the primary ECS archive (a function of ingest) (forward population)
    • Historical data load from primary ECS archive via query (historical population capability) in support of science or user requirements.
    • NASA intent is to grow the on-line to be a working copy of the most popular data
  • Dataset Collections belong to Groups and are configured for N days of persistence and are automatically removed at expiration (rolling archive concept)
    • Data Management of this 2 ndarchive to keep synchronized to primary has been problematic and has increased O&M costs.
  • Data Pool Web client(s) and/or anonymous ftp site access are used to navigate contents, browse, access, and download data products.Directory structure is used:
    • /datapool//// e.g. /datapool/ops/astt/ast_l1b.001/1999.12.31

22. LP DAAC Data Pool Contents & Access

  • Science Data:
  • ASTER L1B Group (TERRA)
    • ASTER collection over U.S. States and Territories (no billing!)
  • MODIS Group (TERRA & AQUA)
    • 8 day rolling archive of daily data for MODIS
    • 12 months of data for higher level products
      • Most 8-day, 16-day, and 96-day products
  • Access Methods:
  • Anonymous FTP Site
  • Web Client interface(s) to navigate & browse data holdings via Sybase inventory database
  • Public Access: http:// lpdaac . usgs . gov / datapool / datapool .asp

23. LP DAAC Data Pool Phase II SAN Goals

  • Phase II FY04 Optimize System Throughput (systemic resource):
  • Maximize Disk Storage Through Shared Resources
  • Centralized Management (System Admin, Hardware Engr) of disk
  • High Performance fibre channel connections
    • SGI, Linux and SUN Clients
  • Decrease turn-around time for production and distribution orders.
  • Integrate SAN into ECS subsystems in Phases as tasks become SAN ready/capable
    • Granules will be served from SAN (Data Pool) if available, rather than staging from tape.Less thrashing of the archives for popular datasets.
      • Effectively allows for more ingest bandwidth as less archive drive contention
      • Trick here is to maintain rule sets for popular data to minimize silo thrashing
    • Less copying of data no need for dedicated read only caches across ingest, archive staging, production, media (PDS), distribution (ftp push & pull)
  • Fully Utilize the SAN infrastructure effectively across the sub-systems of the full ECS archive

24. LP DAAC SAN Overview 25. SAN Reality Check April 19, 2004 Brian Sauer SAIC Contractor [email_address] 26. EDC SAN Experience

  • Technology Infusion
    • TSSC Understands this new technology.
    • Bring it in at right level and at the right time to satisfy USGS programmatic requirements.
    • SAN technology is not a one size fits all solution set.
    • Need to balance complexity vs. benefits.
  • Project Requirements Differ
    • Size of SAN (Storage, Number Clients, etc)
    • Open System Versus Single Vendor
  • Experiences Gained
    • Provides high performance shared storage access
    • Provides better manageability and utilization
    • Provides flexibility in reallocating resources
    • Requires trained Storage Engineers
    • Complex architecture, especially as number of nodes increases

27. EDC SAN Reality Check

  • SAN Issues
    • Vendors typically oversell SAN architecture
      • Infrastructure costs
        • Hardware Switches, HBAs, Fibre Infrastructure
        • Software
        • Maintenance
          • Hardware/Software maintenance
          • Labor
          • Disk maintenance higher than tape
        • Power & cooling of disk vs. tape
      • Complex Architecture
        • Requires additional/stronger System Engineering
        • Requires highly skilled System Administration
      • Lifecycle is significantly shorter with disk vs. tape.

28. EDC SAN Reality Check

  • SAN Issues
    • Difficult to share resources among projects in an enterprise environment
      • Ability to fund large shared infrastructure historically been problematic for EDC
      • Ability to allocate and guarantee performance to projects (storage, bandwidth, security, peak vs. sustained)
      • Scheduling among multiple projects would be challenging
  • Not all projects require a SAN
    • SAN will not replace the Tape Archive(s) anytime soon
    • Direct attached storage may be sufficient for many projects

Recommended