+ All Categories
Home > Documents > CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System...

CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System...

Date post: 18-Jan-2018
Category:
Upload: darrell-reeves
View: 233 times
Download: 0 times
Share this document with a friend
Description:
Google DC in The Dalles Located near 3.1GW hydroelectric power station on Columbia River
58
CIT 470: Advanced Network and System Administration Slide #1 CIT 470: Advanced Network and System Administration Data Centers
Transcript
Page 1: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

CIT 470: Advanced Network and System Administration Slide #1

CIT 470: Advanced Network and System Administration

Data Centers

Page 2: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

TopicsData Center: A facility for housing a large amount of computer or communications equipment.

1. Racks2. Power3. PUE4. Cooling5. Containers6. Economics

Page 3: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Google DC in The Dalles

Located near 3.1GW hydroelectric power station on Columbia River

Page 4: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Google DC in The Dalles

Page 5: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Inside a Data Center

Page 6: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Inside a Container Data Center

Page 7: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Data Center is composed of:• A physically safe and secure space• Racks that hold computer, network, and

storage devices• Electric power sufficient to operate the

installed devices• Cooling to keep the devices within their

operating temperature ranges• Network connectivity throughout the data

center and to places beyond

Page 8: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Data Center Components

Page 9: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Data Center Tiers

See http://uptimeinstitute.org/ for more details about tiers.

Page 10: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Racks: The Skeleton of the DC• 19” rack standard

– EIA-310D– Other standard numbers.

• NEBS 21” racks– Telecom equipment.

• 2-post or 4-post• Air circulation (fans)• Cable management• Doors or open

Page 12: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Rack Sizes

http://www.gtweb.net/rackframe.html

Page 13: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Rack PurposesOrganize equipment

– Increase density with vertical stacking.Cooling

– Internal airflow in rack cools servers.– Data center airflow determined by

arrangement of racks.Wiring Organization

– Cable guides keep cables within racks.

Page 14: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Rack Power Infrastructure• Different power

sockets can be on different circuits.

• Individual outlet control (power cycle.)

• Current monitoring and alarms.

• Network managed (web or SNMP.)

Page 16: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Blade Servers

Page 17: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Buying a RackBuy the right size

– Space for servers.– power, patch panels, etc.

Be sure it fits your servers.– Appropriate mounting rails.– Shelves for non-rack servers.

Environment options– Locking front and back doors – Sufficient power and cooling.– Power/environment monitors.– Console if needed.

Page 18: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

SpaceAisles

Wide enough to move equipment.Separate hot and cold aisles.

Hot spotsResult from poor air flow.Servers can overheat when average

room temperature is too low.Work space

A place for SAs to work on servers.Desk space, tools, etc.

CapacityRoom to grow.

Page 19: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Data Center Power Distribution

http://www.42u.com/power/data-center-power.htm

Page 20: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

UPS (Uninterruptible Power Supply)

Provides emergency power when utility fails– Most use batteries to store power

Conditions power, removing voltage spikes

Page 21: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Standby UPS• Power will be briefly interrupted during switch• Computers may lockup/reboot during interruption• No power conditioning• Short battery life• Very inexpensive

http://myuninterruptiblepowersupply.com/toplogy.htm

Page 22: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Online UPS• AC -> DC -> AC conversion design• True uninterrupted power without switching• Extremely good power conditioning• Longer battery life• Higher price

http://myuninterruptiblepowersupply.com/toplogy.htm

Page 23: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Power Distribution Unit (PDU)

Takes high voltage feed and divides into many 110/120 V circuits that feed servers.

– Similar to breaker panel in a house.

Page 24: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Estimating Per-Rack Power

Page 25: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

The Power Problem• 4-year power cost = server purchase price.• Upgrades may have to wait for electricity.• Power is a major data center cost

– $5.8 billion for server power in 2005.– $3.5 billion for server cooling in 2005.– $20.5 billion for purchasing hardware in 2005.

Page 26: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Measuring Power Efficiency

PUE is ratio of total building power to IT power; efficiency of datacenter building infrastructureSPUE is ratio of total server input to its useful power, where useful power is power consumed by CPU, DRAM, disk, motherboard, etc.

Excludes losses due to power supplies, fans, etc.Computation efficiency depends on software and workload and measures useful work done per watt

Page 27: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Power Usage Effectiveness (PUE)

PUE = Data center power / Computer power– PUE=2 indicates that for each watt of power used

to power IT equipment, one watt used for HVAC, power distribution, etc.

– Decreases towards 1 as DC is more efficient.PUE variation

– Industry average > 2– Microsoft = 1.22– Google = 1.19

Page 28: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Data Center Energy Usage

Page 29: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Sources of Efficiency Losses

UPS– 88-94% efficiency– Less if lightly loaded

PDU voltage transformation– .5% or less

Cables from PDU to racks– 1-3% depending on distance and cable type

Computer Room Air Conditioning (CRAC)– Delivery of cool air over long distances uses fan

power and increases air temperature

Page 30: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Cooling a Data Center• Keep temperatures within 18-27 ◦C• Cooling equipment rated in BTUs

– 1 Watt = 3412 BTUH– BTUH = British Thermal Unit / Hour

• Keep humidity between 30-55%– High = condensation– Low = static shock

• Avoid hot/cold spots– Can produce

condensation

Page 31: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Computer Room Air Conditioning• Large scale, highly

reliable air conditioning units from companies like Liebert.

• Cooling capacity measured in tons.

Page 32: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Waterworks for Data Center

Page 33: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Estimating Heat Load

Page 34: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Hot-Cold Aisle Architecture• Server air intake from cold aisles• Server air exhaust into hot aisles• Improve efficiency by reducing mixture of hot/cold

Page 35: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Free Cooling• Cooling towers dissipate

heat by evaporating water, reducing or eliminating need to run chillers

• Google Belgium DC uses 100% free cooling

Page 36: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Improving Cooling EfficiencyAir flow handling: Hot air exhausted by servers does not mix with cold air, and path to cooling coil is very short so little energy spent moving

Elevated cold aisle temperatures: Cold aisle of containers kept at 27◦C rather than 18-20◦C.

Use of free cooling: In moderate climates, cooling towers can eliminate majority of chiller runtime.

Page 37: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Server PUE (SPUE)

Primary sources of inefficiency– Power Supply Unit (PSU) (70-75% efficiency)– Voltage Regulator Modules (VRMs)

• Can lose more than 30% power in conversion losses– Cooling fans

• Software can reduce fan RPM when not needed

SPUE ratios of 1.6-1.8 are common today

Page 38: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Power Supply Unit Efficiency

80 PLUS initiative to promote PSU efficiency– 80+% efficiency at 20%, 50%, 100% of rated load– Can be less than 80% efficient at idle power load

First 80 PLUS PSU shipped in 2005

Page 39: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Server Useful Power ConsumptionDevice Power UsageIntel Xeon W5590 3.33 GHz Quad Core 130 W

Intel Xeon E5430 2.66 GHz Quad Core 80W

Intel Xeon E5502 2.13 GHz Dual Core 80W

7200RPM Hard Drive 7W

10,000RPM Hard Drive 14W

15,000RPM Hard Drive 20W

DDR2 DIMM 1.65W

Video Card 20-120W

The best method to determine power usage is to measure ithttps://www.wattsupmeters.com/

Page 40: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Server Utilization ~10-50%

Figure 1. Average CPU utilization of more than 5,000 servers during a six-month period. Servers are rarely completely idle and seldom operate near their maximum utilization, instead operating most of the time at between 10 and 50 percent of their maximum

It is surprisingly hardto achieve high levelsof utilization of typical servers (and your homePC is even worse)

“The Case for Energy-Proportional Computing,”Luiz André Barroso,Urs Hölzle,IEEE Computer,December 2007

Page 41: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Server Power Usage Range: 50-100%

Figure 2. Server power usage and energy efficiency at varying utilization levels, from idle to peak performance. Even an energy-efficient server still consumes about half its full powerwhen doing virtually no work.

Energy efficiency =Utilization/Power

“The Case for Energy-Proportional Computing,”Luiz André Barroso,Urs Hölzle,IEEE Computer,December 2007

Page 42: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Server Utilization vs. Latency

Utilization 100%

Late

ncy

Page 43: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Improving Power Efficiency

Page 44: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Improving Power EfficiencyApplication consolidation

– Reduce the number of applications by eliminating old applications in favor of new ones that can server the purpose of multiple old ones.

– Allows elimination of old app servers.Server consolidation

– Use single DB for multiple applications.– Move light services like NTP onto shared boxes.

Use SAN storage– Local disks typically highly underused– Use SAN so servers share single storage pool

Page 45: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Improving Power EfficiencyVirtualization

– Host services on VMs instead of on physical servers– Host multiple virtual servers on single physical svr

Only-as-needed Servers– Power down servers when not in use– Works best with cloud computing

Granular capacity planning– Measure computing needs carefully– Buy minimal CPU, RAM, disk configuration based on

your capacity measurements and forecasts

Page 46: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

ContainersData Center in a shipping container.

– 4-10X normal data center density.

– 1000s of servers.– 100s of kW of power.

Advantages– Efficient cooling– High server density– Rapid deployment– Scalability

Vendor offerings: http://www.datacentermap.com/blog/datacenter-container-55.html

Page 47: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Microsoft Chicago Data Center

Page 48: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Google Container Patents

Containers docked at central power spline

Container air flow diagram, with a centercold aisle and hot air return behind servers

Vertical stackof containers

Page 49: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Data Center Failure Events

Page 50: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Hardware Isn’t Reliable EnoughIf servers are 99% reliable, then

a system with 10 servers is 0.9910 ≈ 90% reliablea system with 100 servers is 0.99100 ≈ 37% reliable

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 970

0.2

0.4

0.6

0.8

1

0.99^n

Page 51: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Fault-Tolerant Architecture

Must use fault-tolerant software architecture– Hardware must detect faults– Hardware must notify software in timely fashion

Fault-tolerant architecture reduces costs– Choose hardware reliability level that maximizes

cost efficiency, not just reliabilityFault-tolerant architecture can improve perf

– Spreading processing and storage across many servers improves bandwidth and CPU capacity

Page 52: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Causes of Service Disruptions

Page 53: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Total Cost of Ownership (TCO)TCO = Data Center Depreciation

+ Data Center Operating Expenses (Opex) + Server Depreciation + Server Operating Expenses (Opex)Depreciation is the process of allocating cost of assets across period during which assets are used.

Example: server cost = $10,000, $0 residual value annual depreciation over 4 years = $2500

Page 54: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Cost to Build Data Center• Primary components (power, cooling, space) scale roughly linearly with space.• 80% of total construction cost goes to power + cooling• Typical depreciation periods of 10-15 years

Page 55: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Operational Costs

Operational costs include– Electricity– Salaries for personnel– Server maintenance contracts– Software licenses

Larger data centers are cheaper– Smaller number of sysadmins per server– Fixed number of security guards

For multi-MW data center, $0.02-$0.08/month

Page 56: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Case StudyTier 3 multi-MW data center

– Dell 2950 III EnergySmart servers (300W, $6000)– Cost of electricity is 6.2₵/kW– Servers financed with 3-year loan @ 12%– Cost of DC construction is $15/W, 12-yr lifetime– DC opex is 4₵/month– PUE = 2.0– Server lifetime is 3 years– Server maintenance is 5% of capex– Server avg power = 75% peak

Page 57: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

Key PointsData center components

– Physically secure space – Racks, the DC skeleton– Power, including UPS and PDU– Cooling– Networking

Power efficiency (server cost = 4 years power on avg)– PUE = Data center power / IT equipment power– Most power in traditional DC goes to cooling, UPS– SPUE = Server PUE; inefficiencies from PSU, VRM, fans

Cooling– Heat load estimation– Air flow control (hot/cold aisle architecture or containers)– Higher cold air temperatures (27C vs. 20C)– Free cooling (cooling towers)

TCO = DC depr + DC opex + Svr depr + Svr opex

Page 58: CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Data Centers.

References1. Luiz Andre Barroso and Urs Holzle, The Case for Energy-Proportional

Computing, IEEE Computer, Vol 40, Issue 12, December 2007.2. Luiz Andre Barroso and Urs Holzle, The Datacenter as a Computer: An

Introduction to the Design of Warehouse-Scale Machines, 1st edition, Morgan and Claypool Publishers

3. Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso, Power provisioning for a warehouse-sized computer, ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

4. Thomas A. Limoncelli, Christina J. Hogan, and Strata R. Chalup, The Practice of System and Network Administration, Second Edition, Addison-Wesley Professional, 2007.

5. Evi Nemeth, Garth Snyder, Trent R. Hein, Ben Whaley, UNIX and Linux System Administration Handbook, 4th edition, Prentice Hall, 2010.


Recommended