Date post: | 20-May-2015 |
Category: |
Technology |
Upload: | steven-shapiro-pe-atd |
View: | 568 times |
Download: | 1 times |
INFRASTRUCTURE RELIABILITY AND RISK ASSESSMENTS
Morrison Hershfield Mission Critical
Steven Shapiro, P.E., ATDMission Critical Practice LeadMorrison HershfieldMission Critical
• RISK ASSESSMENT
• INFRASTRUCTURE RELIABILITYPOWERCOOLING
WHAT YOU NEED TO KNOW
AGENDA
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
• WHY
• SITE EVALUATION
• METRICS
RISK ASSESSMENTS
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
5
• Location• Design • Redundancy level• Construction • Quality of equipment• Age • Operations & Maintenance program • Personnel training • Level of operator coverage• Thoroughness of the commissioning program
Lurking Vulnerabilities
WHY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Causes of Critical Failures
• Equipment failure
• Operator error
• Natural disaster
• Design error
• Installation error
• Commissioning or test deficiency
• Maintenance oversight
• Equipment design
WHY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Causes of Critical Failures
• Root cause not always easy to ascertain
• Combination of factors (Cascading Failures)
• Latent failures
• Most occur during change of state events
• More maintenance does not necessarily mean higher availability
• Non-Fault tolerant systems
FILURESWHY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Causes of Critical Failures
Morrison Hershfield Mission Critical – Infrastructure and Risk Assessment
Commissioning or Test Deficiency
4%
Equipment Design13%
Equipment Failure28%
Human Error18%
Installation Error10%
Maintenance Oversight
4%
Natural Disaster3%
System Design20%
Causes of Critical Failures
WHY
WHY DO RISK ASSESSMENT
• Alignment of business mission and facility performance expectation
• Quantifies the risk and exposure of the critical facilities to failure
• Identifies vulnerabilities and single points of failure
• First step in creating an action plan for site hardening
• Benchmark against the industry
• Assists in developing business case for capital expenditures
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
SITE EVALUATION
STEP 1
• Quantify reliability expectations
• Develop resiliency metrics
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
SITE EVALUATION
STEP 2• Develop PRA model (Probabilistic Risk Assessment)
• Identify Single Points of Failure within critical systems• Evaluate redundancy of critical systems• Capacity and expendability analysis• Adequacy of Engineered Systems• Operation and maintenance policies, practices and procedures• Adequacy of maintenance and testing programs• Evaluate risks associated with site location • Overall Risk Analysis• Evaluate the adequacy of operations and maintenance programs
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
SITE EVALUATION
STEP 2 cont.• Harmonics analysis
• EMF studies
• Short circuit & coordination studies
• Air flow modeling-CFD
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
STEP 3• Perform gap analysis
STEP 4• Recommendations for upgrade/alteration to optimize facility
performance• Budget and schedule development• Assess risk during implementation• Benchmark findings with industry standards
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
SITE EVALUATION
• Probability of Failure/Reliability
• Availability
• MTTF
• MTTR
• Susceptibility to natural disasters
• Fault tolerance
• Single Points of Failure
• Maintainability
• Operational readiness
• Maintenance program
RISK ASSESSMENT Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RISK ASSESSMENT METRICS
• RELIABILITY / AVAILABLITY
• RELIABILITY MODELING
• RELIABILITY CONSIDERATIONS
INFRASTRUCTURE RELIABILITY
Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRELIABILITY
RELIABILITY
• “Reliability” is used as an umbrella definition
• May Refer to Availability, Durability, Quality
• Five 9’s ????
• Reliability = Probability of Successful Operation
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RELIABILITY AND AVAILABILITY
• Reliability predicts how likely is the system to fail.
• Availability is a measure (or a future prediction) of what percentage of the time the system will operating properly
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
AVAILABILITY
Five 9’s refers to Availability
Availability (A) = Average fraction of time Something is in service and performing intended function.
99.999% availability means:• 5.3 minutes of downtime each year
or• 1.77 hours of downtime every 20 years
Availability does not specify how often an outage occurs
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
AVAILABILITY
Availability (A) = MTBF/(MTBF + MTTR)
MTTF: Mean Time To FailureMTBF: Mean Time Between FailuresMTTR: Mean Time to Repair or DowntimeMTBF=MTTF+MTTR
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RELIABILITY BATHTUB CURVE
12YEARS0.514
Failu
re R
ate
Time (t) Years
early wear-outlife useful life period
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RELIABILITY MODELING
• Used to compare system designs and assist in the evaluation of risk versus the cost to mitigate the risk.
• Failure and Repair data comes from IEEE 493, Recommended Practice for Design of Reliable Industrial and Commercial Power Systems (IEEE Gold Book)
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RELIABILITY MODELING
Components used for reliability modeling of the electrical system shown here:
• Utility power• Generator• Circuit breakers • Switchboards• Cables• Automatic Transfer Switch• UPS module• Battery• Static Bypass Switch• Rack Power
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RELIABILITY MODELING
Reliability Block Diagram (RBD)
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
RELIABILITY MODELING
Shown below are the results of the calculations
Hours Hours
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
THE TRADITIONAL CLASSIFICATION SYSTEMThe Uptime Institute
Tier 1 – Basic Non-Redundant Data CenterSingle path for power and cooling distribution without redundant components
Tier 2 – Basic Redundant Data CenterSingle path for power and cooling distribution with redundant components
Tier 3 – Concurrently Maintainable Data CenterMultiple paths for power and cooling distribution with only one path active and with redundant components
Tier 4 – Fault Tolerant Data CenterMultiple active power and cooling distribution paths with redundant components and fault tolerant
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Tier Definitions
Tier I Tier II Tier III Tier IV
Number of Delivery Paths 1 11 Active
1 Passive2 Active
Redundancy N N+1 N+1 2N MinimumCompartmentalization No No No YesConcurrent Maintainability No No Yes YesFault Tolerance No No No YesAvailability 99.67 99.75 99.982 99.95Downtime in Hr/Yr 28.8 22 1.6 0.4
TIER REQUIREMENTS
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
From the UI
• Tier I - $10,000 US/kW of Useable UPS Power Output
• Tier II - $11,000 US/kW of Useable UPS Power Output
• Tier III - $20,000 US/kW of Useable UPS Power Output
• Tier IV - $22,000 US/kW of Useable UPS Power Output
• Plus $225 US/SF of Computer Room
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Data Center Cost
HOW MUCH REDUNDANCY IS ENOUGH?
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Assumptions
• Various configurations examined for single or dual utility feeders, UPS,
Generators, STS’s, single or dual cords
• Compare Reliability at 2000 KW and 4000 KW Load
• 5 Year Probability of Failure
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
Single utility feeder, parallel redundant UPS and generators, single cord IT equipment
2N UPS, N+1 Generators, ASTSs, Dual Cord Rack
Two Utility Feeders, 2(N+1) UPS, 2(N+1) Generators, ASTSs, Dual Cord Rack
Distributed Redundant UPS, N+2 Generators, Two Utility Feeders, ASTSs and Dual Cord Rack
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
fail after 24 hours
Study Performed by Idaho National Engineering Laboratory – February 1996 at Nuclear Power Plants
Emergency Diesel Generators
fail to start
fail after ½ hour
fail after 8 hours
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
• 2(N+1) UPS/Generator with dual utility feeders - most reliable topology
• 2(N+1) UPS > 2N UPS by small margin
• 2N > Distributed Redundant by small margin
• Significant improvement if a second utility feederis provided
• N+2 and/or 2N generator systems are more reliable than N+1
• Hybrid configuration in a hybrid facility is sometimes the best solution
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
• Assess the condition of the mechanical plant in conjunction with the electrical system
• The facility reliability will be driven by the least reliable component (typically the electrical infrastructure)
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
Reliability Considerations
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
System Reliability Block
Electrical MechanicalElectrical System
Electrical system powering the critical load
Mechanical system supporting critical load
Electrical MechanicalElectrical System
Electrical system powering the critical load
Mechanical system supporting critical load
MTBF Availability Pf (3 years)Electrical systemalone 330,184 0.99999 8.10%Mechanical systemalone 178,611 0.999943 11.70%Electrical systemsupporting mechanical 108,500 0.999985 21.40%Overall mechanicalsystem 70,087 0.999931 29.20%Combined electricalmechanical system 57,819 0.999922 36.90%
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
System Reliability Block
99.0
.9
99.9
99.99
99.999
Reliability
99.9999
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
$ $$ $$$ $$$$ $$$$$
The Cost of Reliability
• What Reliability Level Do you Really Need Based on Your Business
Case?
• Minimize Single Points of Failure
• Concurrent Maintainability?
• Fault Tolerance?
• Ensure Adequacy of Operations, Maintenance and Testing Programs
• How to justify the cost to upgrade from present state?
Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT
Key Takeaways – Risk Assessment
• Design objective – find optimum compromise between cost and reliability
• Size matters – larger facilities yield lower reliability
• System architecture and design implementation is more important role than equipment selection
• Segregate system in independent blocks
• Eliminate common source components to minimize fault propagation (i.e. LBS, hot-tie, manual bus ties)
• Move single points of failures as close to the load as possible
• Always maintain two independent sources of power to the critical load
• Optimize the design of monitoring and controls circuits
• Keep it simple/minimize human intervention/Utilize Automation
Key Takeaways – Reliability
RELIABILITY Morrison Hershfield Mission Critical – Infrastructure and Risk Assessments
QUESTIONS? Thank you and please feel free to contact me
Steven Shapiro, PE, [email protected]://www.linkedin.com/in/stevenshapirope
References:Uptime Institute White Papers:Tier Myths and MisconceptionsData Center Site Infrastructure Tier Standard: Topology
48
Building Areas/Systems Reviewed
׀ General Construction׀ Electrical׀ Mechanical׀ Plumbing And Fire Protection׀ Operation and Maintenance׀ Security ׀ Load Density
Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT
49
Site Reliability• Is Project Compatible With Zoning• Natural Environment Issues׀ Seismic Zone׀ Geo Technical Reports׀ Sub Surface Conditions׀ Tornado/hurricane Risk׀ Site Flood Potential׀ Fire Potential׀ Site Topography׀ Weather Extremes• Man‐Made Environment Issues׀ Power/Data and Communication/Water Supply/Sanitary Sewer Availability׀ ISP Connectivity to Mirror and DR Sites׀ Proximity of Hazardous Operational Facilities, i.e. Nuclear Power Plants, Military Bases,
Chemical Plants, Tank Farms, Water/Sewage Treatment Plants, Dams/Reservoirs, Gas Stations, etc.
׀ Distance to Airports & Freeways׀ Distance to Emergency Services, i.e. Fire and Police Departments, Hospital
Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT
50
Building Areas/Systems ReviewedBuilding Utilities and Physical Issues׀ General building systems and area characteristics׀ Life safety and environmentalElectrical Systems׀ Utility feeders׀ Service entry׀ Base building electrical distribution system including busways, step‐down
transformers, switchgear and distribution panels׀ Uninterruptible power supply (UPS) systems׀ Battery systems׀ Power Distribution System including the critical computer rooms׀ Emergency/standby generator and fuel system׀ Normal/standby power transfer switchgear׀ Grounding׀ Emergency Power Off Systems׀ Lightning protection system׀ Fire alarm and smoke detection systems
Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT
51
Building Areas/Systems Reviewed• Mechanical Systems׀ Critical Systems Chilled Water Plant: Chillers, pumps, piping distribution system,
controls, etc׀ Critical Systems Condenser Water System: Cooling towers, pumps, piping, etc׀ Critical Systems Air Handling Systems׀ Critical Systems Air Distribution׀ Critical Systems Secondary Chilled Water Loop׀ Fuel Oil Systems׀ Boiler Systems׀ Compressed Air Systems• Plumbing Systems׀ Domestic Water Systems׀ Natural Gas Systems׀ Fire Suppression Systems (Water and Gaseous)• Operation and Maintenance of the Critical Support Systems׀ Maintenance procedures and programs׀ Normal operating procedures׀ Emergency operating procedures׀ Training programs and methods׀ Spare parts
Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT
52
Building Areas/Systems Reviewed• Building Automation׀ Building Automation Systems.׀ Physical Security Systems.׀ Access control׀ Intrusion detection׀ CCTV systems׀ ID badging systems׀ Intercom systems׀ Smoke Purge Systems• Technology Systems׀ Entrance Facility Feeds.׀ Telephone Company Services.• Systems Integration:׀ The integration, compatibility and interaction of the above systems with each
other, as well as with the other building elements will be reviewed to ensure that the systems are compatible and fully integrated.
Morrison Hershfield Mission Critical – Infrastructure and Risk AssessmentsRISK ASSESSMENT