+ All Categories
Home > Documents > [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications...

[IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications...

Date post: 13-Dec-2016
Category:
Upload: kumar
View: 221 times
Download: 2 times
Share this document with a friend
8
A Scalable WSN based Data Center Monitoring Solution with Probabilistic Event Prediction Sunil Kumar Vuppala, Animikh Ghosh, Ketan A Patil, Kumar Padmanabh Infosys Labs Infosys Limited, Bangalore, India Email: {sunil vuppala,animikh ghosh,ketan patil}@infosys.com Abstract—The two most important objectives of the data center operators are to reduce operating cost and minimize carbon emission. Consolidation of data centers is not always possible and big enterprise end up having data centers in multiple locations across different cities and countries. In such a diverse deployment manual monitoring is not a cost effective solution. ASHRAE [1] suggested considering the energy efficiency as key factor in data center design. Our initial experiments reveal that a reduction in one degree Celsius of data center room temperature results in 4% excess consumption of electricity. We developed a WSN based data center monitoring (DCM) solution which includes the hardware system and an enterprise application. We deployed the hardware system in hundreds of location at 7 different cities and monitored them from a central enterprise application dashboard. In this paper, we describe the system architecture and analyzed data that was captured for nine months. This is one of the largest real life WSN deployment and based on the result we argue that the manual monitoring cost of data centers is reduced by 80%. This deployment also helped in avoiding a significant amount of carbon emission. DCM also provides a mechanism to predict events in real time. Keywords-data center, monitoring, control, gateways, motes, cost, DCM I. I NTRODUCTION Data centers (DC/DCs) are the places in enterprise where multiple servers host applications or store critical data. There is a need for sophisticated ambience control. The ambient parameters in DC such as temperature and humidity are likely to change based on the varying load on the servers and alteration in outside weather. There are sophisticated ambient control systems available in the market [2] which does rack and server level precision control and can set their operating parameters adaptive to the ambience. However, there are enterprises which have multiples of small DCs spread across different cities or countries. The available complex solutions may not be a cost effective choice for deployment in small DCs. Wireless sensor networks are emerging as an enabling technology for many applications which require quantities measured at multiple points in the physical world. A typical Wireless Sensor Network (WSN) consists of tiny sensor nodes (termed as motes) with the capability of performing integrated sensing, data processing and transmission. In the proposed solution we leveraged the enterprise network along with the WSN to come up with a cost effective and highly scalable solution. Apart from monitoring, it can be used to reduce carbon emission and predict real time events. According to research, energy consumption by DCs within US is getting doubled every 5 years from the year 2001 and more than 60% of DCs and server rooms experience 1-4 downtimes a year [3] due to change in operational conditions such as rise in temperature, UPS failure, humidity, water leakage, smoke and inappropriate air flow. Most of the DCs in large corporate organizations (who have large number of small DCs) are still monitored manually and optimal working condition are hardly achieved. In this paper we are describing the DCM solution that monitors all possible ambient parameter and is extensible to incorporate future sensors. DCM is automatic and generate different alerts either to avoid any potential downtime or to save energy. The DCM solution is now operational in our organization for more than nine months. Initially it was a test bed spread across 7 different cities which eventually evolved as a full-fledged solution. Following are the contribution of this paper: 1) System Architecture: We describe the system architec- ture that leverages internet and ZigBee to aggregate sensor data available in distant geography and locally. 2) Gateway-motes communication Protocol: There is a hierarchal network of sensors. Wireless sensors (motes) report to gateway on wireless link. All such gateways form a network and report to an enterprise server via Ethernet. There is a combination of Server Push and Client Pull mechanism of data collection. This com- munication protocol defines the entire process of data collection. 3) Probabilistic Event Prediction (PED): The system can predict events such as sudden steep rise or fall in temperature reported by any sensor placed at a rack level in the DC. A sudden steep rise in temperature indicates chances of fire hazard where as a steep fall in tem- perature indicates over-cooling. Over-cooling indicates wastage of energy. These type of event prediction in real time are based on a novel formulation of Hidden Markov Model (HMM). This will enable the DC operators to take preventive measures. This prediction can reduce the chances of accidents (such as fire hazards) or damage of server room equipment’s due to abnormal heating, over-cooling and many other factors. The DCM hardware facilities monitoring of DC ambient parameters closely and raise alerts in case of threshold breech 2012 26th IEEE International Conference on Advanced Information Networking and Applications 1550-445X/12 $26.00 © 2012 IEEE DOI 10.1109/AINA.2012.94 446
Transcript
Page 1: [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA) - Fukuoka, Japan (2012.03.26-2012.03.29)] 2012 IEEE 26th International Conference

A Scalable WSN based Data Center MonitoringSolution with Probabilistic Event Prediction

Sunil Kumar Vuppala, Animikh Ghosh, Ketan A Patil, Kumar PadmanabhInfosys Labs

Infosys Limited, Bangalore, IndiaEmail: {sunil vuppala,animikh ghosh,ketan patil}@infosys.com

Abstract—The two most important objectives of the data centeroperators are to reduce operating cost and minimize carbonemission. Consolidation of data centers is not always possible andbig enterprise end up having data centers in multiple locationsacross different cities and countries. In such a diverse deploymentmanual monitoring is not a cost effective solution. ASHRAE [1]suggested considering the energy efficiency as key factor in datacenter design. Our initial experiments reveal that a reductionin one degree Celsius of data center room temperature resultsin 4% excess consumption of electricity. We developed a WSNbased data center monitoring (DCM) solution which includes thehardware system and an enterprise application. We deployed thehardware system in hundreds of location at 7 different cities andmonitored them from a central enterprise application dashboard.In this paper, we describe the system architecture and analyzeddata that was captured for nine months. This is one of the largestreal life WSN deployment and based on the result we argue thatthe manual monitoring cost of data centers is reduced by 80%.This deployment also helped in avoiding a significant amountof carbon emission. DCM also provides a mechanism to predictevents in real time.

Keywords-data center, monitoring, control, gateways, motes,cost, DCM

I. INTRODUCTION

Data centers (DC/DCs) are the places in enterprise wheremultiple servers host applications or store critical data. Thereis a need for sophisticated ambience control. The ambientparameters in DC such as temperature and humidity are likelyto change based on the varying load on the servers andalteration in outside weather. There are sophisticated ambientcontrol systems available in the market [2] which does rackand server level precision control and can set their operatingparameters adaptive to the ambience. However, there areenterprises which have multiples of small DCs spread acrossdifferent cities or countries. The available complex solutionsmay not be a cost effective choice for deployment in smallDCs. Wireless sensor networks are emerging as an enablingtechnology for many applications which require quantitiesmeasured at multiple points in the physical world. A typicalWireless Sensor Network (WSN) consists of tiny sensor nodes(termed as motes) with the capability of performing integratedsensing, data processing and transmission. In the proposedsolution we leveraged the enterprise network along with theWSN to come up with a cost effective and highly scalablesolution. Apart from monitoring, it can be used to reducecarbon emission and predict real time events. According to

research, energy consumption by DCs within US is gettingdoubled every 5 years from the year 2001 and more than 60%of DCs and server rooms experience 1-4 downtimes a year[3] due to change in operational conditions such as rise intemperature, UPS failure, humidity, water leakage, smoke andinappropriate air flow. Most of the DCs in large corporateorganizations (who have large number of small DCs) are stillmonitored manually and optimal working condition are hardlyachieved. In this paper we are describing the DCM solutionthat monitors all possible ambient parameter and is extensibleto incorporate future sensors. DCM is automatic and generatedifferent alerts either to avoid any potential downtime or tosave energy. The DCM solution is now operational in ourorganization for more than nine months. Initially it was a testbed spread across 7 different cities which eventually evolvedas a full-fledged solution. Following are the contribution ofthis paper:

1) System Architecture: We describe the system architec-ture that leverages internet and ZigBee to aggregatesensor data available in distant geography and locally.

2) Gateway-motes communication Protocol: There is ahierarchal network of sensors. Wireless sensors (motes)report to gateway on wireless link. All such gatewaysform a network and report to an enterprise server viaEthernet. There is a combination of Server Push andClient Pull mechanism of data collection. This com-munication protocol defines the entire process of datacollection.

3) Probabilistic Event Prediction (PED): The system canpredict events such as sudden steep rise or fall intemperature reported by any sensor placed at a rack levelin the DC. A sudden steep rise in temperature indicateschances of fire hazard where as a steep fall in tem-perature indicates over-cooling. Over-cooling indicateswastage of energy. These type of event prediction in realtime are based on a novel formulation of Hidden MarkovModel (HMM). This will enable the DC operators totake preventive measures. This prediction can reduce thechances of accidents (such as fire hazards) or damageof server room equipment’s due to abnormal heating,over-cooling and many other factors.

The DCM hardware facilities monitoring of DC ambientparameters closely and raise alerts in case of threshold breech

2012 26th IEEE International Conference on Advanced Information Networking and Applications

1550-445X/12 $26.00 © 2012 IEEE

DOI 10.1109/AINA.2012.94

446

Page 2: [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA) - Fukuoka, Japan (2012.03.26-2012.03.29)] 2012 IEEE 26th International Conference

(thresholds are preset by admin). This allows the DCs tobe operated in an energy efficient way by maintaining theoperating level of these ambient parameters at the edge of theASHRAE [1] permissible limits. The presence of this alertbased DCM solution reduces the manual operating costs ofDC significantly.

The rest of the paper is organized as follows. In sectionII, we give an overview of existing DCM solutions from theliterature. We present the DCM system architecture in sectionIII. The results are analyzed and insights are presented insection IV and finally the conclusion is drawn in section V.

II. RELATED WORK

Sensors have been deployed to collect data remotelyand eliminate human involvement from the system. Sensornetworks have turned out to be one of the key solutionsfor automated, controlled and structured gathering of datacontinuously.Sensor networks are deployed for measuring precision valuesin agriculture [4], addressing area coverage problems [5], [6],[7], [8], animal habitat [9], [10], [11], predicting hazardousvolcanic eruptions [12], earthquake monitoring [13], forest fire[14] and even for body area sensing [15] using biomedical oractuating sensors.Worldwide sensor deployment strategies are undertaken forresearch or industrial purpose [16] to address a variety ofproblems described above which may require outdoor as wellas indoor deployment. The key focus of this paper revolvesaround strategic indoor deployment of sensors. There arevarious indoor scenarios that may require strategic sensordeployment [17], [18], [19]. One of those challenges in hand isthe data center monitoring [2]. It involves thermal management[20], temperature aware workload distribution [21], secure andefficient data transmission for high performance data inten-sive computing [22]. To address such critical requirementsautomated data gathering appears to be necessary and it mayinvolve deploying field sensors or such automated monitoringdevices.RacNET [23] provides high temporal and spatial fidelity mea-surements that can be used towards improvement of DC safetyand energy efficiency. RacNET deploy sensors to assure highfidelity measurement to track wastage of energy which maybe due to unnecessary operational hours of CRAC systems,water chillers and (de)humidifiers. The authors of RACNETclaim their contributions in terms of safety as tracking heatdistribution and predicting thermal runaways.The June 2005 issue of AFP Magazine [24] fire hazard hasbeen pointed out as one of the costly damages that may affectcomputer and DCs. So steps to prevent or predict onset of fireis extremely critical. There are smoke detection [25] and alertssystems available but to the best of our knowledge there doesnot exist a data center energy monitoring device that comeswith inbuilt smoke sensors, sensor to detect water leakage orwet floor which are extremely critical to maintain the overallDC safety.SynapSense [2] mainly addresses environmental savings by

developing a DC optimization platform. The SynapSense DataCenter Optimization Platform is comprised of sensor nodes,gateways, routers and server platforms. It interprets temper-ature, humidity and sub floor pressure differential data fromthousands of sense points. It also enables measuring of powerand incorporate pre-existing BMS data via BACnet, Modbusand SNMP. Monitor ambient parameters to stay within speci-fied ASHRAE ranges and provide alerts when boundaries areexceeded. But standard SynapSense projects are used for floorsof area above 20,000 square feet and thousands of sensingpoints and the data can be used for complex ap InterSeptorEnvironmental Monitoring System [26] from Jacarta is a full-featured, scalable network environmental monitoring devicedesigned to remotely monitor temperature, humidity and otherenvironmental conditions in DCs, IT rooms and racks. Emailand SNMP alerts are available as standard. But the InterSeptorunit does not communicate via wireless so cannot form apeer to peer network. Absence of such group level networkoperational architecture removes the possibility of pushingedge intelligence to address group level decision makingability at the device level. Also, unit price of approximately$800 may be considered too high for a DC rack level sensingcomponent by any enterprise. Summarily these works do nothave extensive application platform and their hardware cannotincorporate existing sensors in the building as their hardware isnot extensible. Moreover they do not give alerts to the operatorto set operational threshold parameter which will minimizeenergy consumptions.

III. DCM HARDWARE DESCRIPTION

The goals of the system are two folded. First is to designcost effective WSN based monitoring application for geo-graphically distributed small DCs. Second is to avoid unfor-tunate events in the system by using prediction model. Theentire solution can be broadly classified into four categories.The first one is system components. The second category isthe protocols used for communication and third one is thesystem architecture that describes how different componentsare collaborating with each other. The last one is the methodsand functionalities in the system.

A. Components:

The different components of DCM solution are categorizedas hardware, embedded software and application software.

1) Hardware: The DCM hardware comprise of wirelessmodule and sensor module as depicted in figures 1(a) and1(b). The wireless module is also known as ”mote” and iscapable of working independently.

The wireless module is based on CC2431 System-On-Chip from Texas Instruments. The CC2431 has two systemsfabricated in a single chip namely IEEE 802.15.4 compliantCC2420 RF transceiver and an industry-standard enhanced8051 MCU (Micro Controller Unit). The chip has 128 KBflash memory and 8 KB RAM. The wireless module has anembedded battery monitoring chip DS2438. The presence of

447

Page 3: [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA) - Fukuoka, Japan (2012.03.26-2012.03.29)] 2012 IEEE 26th International Conference

(a) Sensor Module (b) Wireless Module

Fig. 1. Hardware components for DCM

this chip avoid imposing battery monitoring task to the micro-controller. Thus charging of the wireless module is possibleeven when the micro-controller is in sleep mode. It alsohas an on-board Intersil ISL29023 light sensor. For designsimplicity and to reduce cost of the module we have usedthe internal temperature sensor of DS2438 for temperaturemonitoring purpose. The wireless module is powered by a 900mAH Lithium-ion battery placed at the bottom of the PCB forspace optimization. For extended range of communication wehave used CC2591 low noise amplifier (LNA) which acts asa booster and useful for impedance matching. We have testedthis wireless module which is a full fledged mote and it isgiving a range of 800 meters in line of site environment. Weanticipate lots of interference in DCs and this power amplifierincreases reliability in message delivery.The sensor module consists of a 16 bit PIC24FJ256GB110micro-controller from Microchip. This micro-controller sup-ports up to 16 MIPS Operation at 32 MHz. It has 16KRAM and 256k ROM. The DCM module has an embeddedHoneywell SHT15 Humidity sensor, LM92 temperature sensorfrom National Semiconductors. The module has the ability ofbeing powered through USB along with standard power outlet.The module supports serial, USB, Ethernet,and ZigBee modeof communication with its peers. It has 4 analog and 4 digitalI/O pins. The analog I/0 pins has the capability to support4mA-20mA current loop and 0-5 V analog inputs. This canbe used to connect external sensors. Its amplifier is controlledby the software and hence varieties of sensors can be directlyconnected.This Hardware configuration of sensor module and wirelessmodule gives the flexibility of reusing the system for differentWSN application verticals.

2) Embedded software: MOJO [27] is the middleware toprocess and extract information from sensor packets. MOJOabstracts the complexities of wireless sensor network andpresent the Application Programmable Interfaces (APIs) to thedeveloper. Thus developers need not worry about the physicalmotes as the functionalities are available via APIs. Further

information about MOJO can be collected from [27].3) Application software: The application software is built

on top of the processed sensor data which includes businesslogic, database and user interfaces.

B. The network communication

Each DC may need several motes and gateways as perthe size of the DC room and requirement of the sensorparameters to be monitored. Each DC has at least one DCMgateway to collect the data from motes. Motes communicateto the gateway using ZigBee wireless protocol. The mote andgateway used in DCM implementation are shown in figure 3.

• Communication between gateway and server: The servercommunicates with the DCM gateway through Ethernetmedium. Similar to UPnP device the DCM gateway actsas a DHCP client and search for a DHCP server whenthe device is first connected to the network. The com-munication mechanism involves both Server Push andClient Pull. Discovering the DCM gateway over Ethernetis done using light weight device discovery protocol. Inthis method the network is periodically flooded with mul-ticast UDP packets encrypted with in-house lightweightencryption algorithm suitable for embedded applications.The UDP packets are picked up by DCM gateway anddecrypted to verify the identity of the server. Due to thebusiness strategies the description of encryption is notin the scope of this paper. The initial communicationsetup is completed by sending DCM gateway descriptionto the server for registering the device. The server pushis initiated at regular interval to collect data from theDCM gateway. The client pull mode of communicationis possible in the occurrence of high priority event thatshould be informed to the server. For example in case oflow battery status of a mote the gateway needs to informthe server immediately and should not wait for serverpush.

• Communication between motes and gateway: The DCMgateway communicates with motes associated with it

448

Page 4: [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA) - Fukuoka, Japan (2012.03.26-2012.03.29)] 2012 IEEE 26th International Conference

using low-power ZigBee wireless protocol. The DCMgateway acts as master coordinator polling each motefor health status and data at specific periodic intervals.Addition of new motes or deletion of dead motes to/fromthe network are handled by the ZigBee wireless associa-tion/disassociation schemes.

The DCM hardware system has temperature, light, humidityand battery monitor sensors, which are available on board.There is a provision of connectivity to the external sensorsalong with the on board sensors. The external sensors usedin our deployment are smoke sensor, water leak detector, 30kVA UPS and 160 kVA UPS status detectors.

Fig. 2. Hardware for DCM

C. System Architecture:

The architecture of the system is depicted in figure 3.

Fig. 3. DCM System Architecture

The motes transmit sensor data to DCM gateway whichaggregates data from all associated motes and transmit thataggregated data to the server. At the server the data isprocessed by the MOJO middleware. After the extraction ofthe sensor readings, the system verifies if any alert condition ismet. In case of alert condition, the corresponding alert is raisedand assigned to the concerned operator of DC. All controllerfunctionality and the business logic are implemented in thecore of the web application and web interfaces are provided tothe end users for viewing / analyzing the aggregated data andalerts. The DCM Solution gives provision to set a hierarchy ofthreshold levels for all the types of sensors located in variousDCs. In general, the threshold settings will be same across theorganization for different DCs. But the web interface provides

the option to change the threshold values as per the need.For example, the room temperature is normal till 25°C, itis in first level alert if it crosses 25°C but less than 27°C.Second level of alert is above 27°C and below 30°C. Highalert is raised if it is above 30°C. Administrators can set alertsbased on the severity and sensor type to the concerned personsusing the DCM Solution. Based on the threshold configuration,the respective alerts will be notified via SMS and Email. Theapplication is easy to configure and deploy. Live deploymentof the DCM gateway in a server room is shown in the figure4.

Fig. 4. Live Deployment of DCM Gateway

D. Functionalities and category of users in the system:

Different categories of users in the system are guest user,operator and administrator. Various functionalities in the sys-tem are assigned to different users as shown in the table I.

TABLE IFUNCTIONALITY AND USERS CATEGORY ACCESSIBILITY

User Type Monitor Analysis Config UserMgt

Testcases

Guest user Yes - - - -Operator Yes Yes - - YesAdministrator Yes Yes Yes Yes Yes

1) Monitoring: These are set of features that enable displayof live sensor data on DCM Solution dashboard and allowsreal time monitoring of alerts from the dashboard by applyinglocation filters at different levels like country, city, buildingand DC.

2) Analysis: This set consists of the log files, reports andanalysis which is accessible to registered users and adminis-trators for off line analysis.

3) Configuration: The configuration and settings are acces-sible to the administrators where they can set the thresholds forindividual motes, configure new motes, assign alerts to usersetc...

4) User Management: These set of features helps in userregistration process and edit user profiles. The approval issubject to the authorization from the administrator.

449

Page 5: [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA) - Fukuoka, Japan (2012.03.26-2012.03.29)] 2012 IEEE 26th International Conference

5) Test Cases: These features allow the user to test thecorrectness of the system and sensor data, which is used duringdeployment and the maintenance of the system.

E. APIs used in DCM Solution:

The data transmitted to the DCM application server fromthe DCM gateways is processed at MOJO and the live sensordata is available in the data structures termed as device cloudand the database. The server has different methods based onthe sensor data, which are specific to a location like setData,listAlertThresholds, checkAlerts and Alert number generator.The server has few more generic methods like alertProcess toprocess the alerts and notify via SMS and email.

1) setData: The method updates the sensor data into thedevice cloud and in the DCM central server database.

2) listAlertThresholds: The method lists the threshold val-ues of all the sensor readings for a mote. Each sensor readingis compared to the alert-threshold list to identify any alerts inthe system.

3) checkAlerts: In this method, alert is checked for eachsensor reading against the thresholds defined. Hysteresis be-havior is observed for temperature, humidity, light and voltagereadings before generating the alerts to avoid false positiveconditions. An alert is generated if the specific condition issatisfied for an alert level. If an alert is generated for one alertlevel and a particular mote and sensor type, the alert is notagain generated unless there is a change in the alert level or noresponse in 30 mins/ one hour (reference values) time framefor that particular alert.

4) AlertNumberGenerator: This is generation of uniquereference number for the alert signifying the location, moteand sensor type. Alert is tracked for its status using thisreference number.

5) AlertProcess: Once the alert is generated, the serverprocesses the particular alert and notifies via SMS or Emailor both to the concerned person. If there is no response forthe generated alert in first 30 minutes, then another alert isgenerated with escalation. These 30min, one hour are takenas hypothetical times, which can be set as per the businessapplication requirement.

IV. RESULTS & ANALYSIS

In this section we describe our deployment set up, themethod of our data collection and subsequently we analyzethe gathered data.

A. Deployment setup:

The experimental setup in our organization is one of thelarge scale real time WSN deployments in true distributedenvironment. The DCM gateways and motes are spread acrossthousands of kilometers in distance in various cities in India(Bangalore, Hyderabad, Pune, Mysore, Chennai, Chandigarhand Mangalore) and system is operational from more than ninemonths. The system gives the flexibility for the operators to setvarious threshold levels for each sensor in the motes. We havecollected millions of records of sensor data which is analyzed

for concluding interesting insights. The data is available foroffline analysis and for generation of reports.

There are number of parameters affecting the DC operationnamely: HVAC system, number of servers, dimension of theroom and external weather conditions. For HVAC system, thevariables affecting the energy consumption level are outlet wa-ter temperature, fan speed and damper positions. Temperature,humidity, carbon dioxide levels are going to offer a particularcomfort level of DC operation. Chiller electrical load can beexpressed in terms of percentage of full load amps (FLA)which is a power consumption indication. There is a relationbetween electrical power consumption and outlet temperatureset point. The temperature can be measured at four levels inthis type of system. Chiller level temperature, HVAC set pointtemperature, room and rack level temperature inside a DC. Inthis paper, our experiments are focused on measuring the roomlevel and rack level temperatures.

B. Manual Monitoring Cost Savings:

With the introduction of semi automated DCM system,we can severely cut down the cost of repetitive manualmonitoring. In our organization, the operator visits the serverroom every 2 hours and logs the temperature and humidityreadings manually (Case-1 in Table II: figure 6 representsthis case with 26°C as alert condition). With the introductionof this DCM solution, the operator needs to visit only if analert is raised. (Case-2 in Table II). In response to the alertsgenerated, the operator needs to fix the corresponding issue.The operator needs to visit the DC at least once. If the alertis raised for temperature only at one corner/rack level, thenoperator can check the air flow in that particular place toresolve the issue. If the problem persist he may need to fixthe issue by changing the A.C. temperature. In some cases,for each alert raised, the operator may visit the DC two times.In the first visit, the operator can fix the issue by adjusting theAC temperature and second visit to set the temperature levelback to the normal operating range after the system stabilizes.The sample observation for three days is presented in Table II.

TABLE IIMANUAL MONITORING COSTS: NUMBER OF VISITS TO DC

Day Case-1 Case-2 RemarksDay-1 12 2 Temperature alertDay-2 12 1 UPS alertDay-3 12 2 Temperature alert

We have observed that a saving of over 80% in manualmonitoring costs of the DCs is possible.

C. Operating levels of DC:

Room level Temperature Experiments:In figure 5, the temperature of four corners in a DC was

recorded from 1:00PM 1st June 2011 to 4:00AM on 2ndJune 2011. The readings are within the ASHRAE level andon the lower bound. We have conducted the experiments byincreasing the temperature level by 2-3 degrees and observed

450

Page 6: [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA) - Fukuoka, Japan (2012.03.26-2012.03.29)] 2012 IEEE 26th International Conference

Fig. 5. Room level temperature of all corners in a DC

the potential energy savings. As per our practical observa-tions, one degree centigrade (1°C) reduction in temperature ofHVAC corresponds to 4% increase in electricity consumption.In effect, close to 10-12% energy savings are observed byincreasing the data center AC temperature and still keepingwithin the ASHRAE levels and able to generate the alerts asand when the sensor values cross its thresholds.

D. The Green Effect:

On average, for each kWh of electricity, 743gms of carbondioxide [28] will be emitted. In a typical example of Infosys,8% of energy consumption is for the server rooms/DCs. Totalenergy consumed at Infosys is 250 million units per annum,therefore the server room energy share is a minimum of 20million units considering 8% share. By using our system, wecan save upto 10% of energy which directly translates tosaving of 1486 tons of carbon dioxide emission in an year.(2000000 units * 0.743 kg = 1486000 kg)

E. Observation of Patterns:

The patterns of temperature and humidity are observed overseveral days at rack level. Consecutive three days data ispresented in the graphs. The temperature in degree Celsius(y-axis) and humidity percentage (y-axis) are plotted over aperiod of 3 days(x-axis) in figure 6 and figure 7 respectively.

Fig. 6. Temperature pattern for 3 days in a DC at rack level

The temperature variation over a day demonstrate period-icity of occurrence. This is because though the DC have acontrolled environment, however outside temperature varies.There are three factors affecting the temperature of DC. The

Fig. 7. Humidity pattern for 3 days in a DC

temperature of refrigerant which is carrier of heat/cold (airin our case), fan speed and area of opening. Though thetemperature is set at a fixed value, however due to change inenvironment the temperature of refrigerant changes. Based onthese results we recommended our operators that fan speedof HVAC should be changed according to the time of theday. There is a similar recommendation for the humidity.Thus, while it is imperative to change the operating conditionof HVAC according to the season, our solution enables theoperators to have precise control according to the time of theday. As Future improvements to the DCM Solution we willhave a feedback loop for automatic adjustment of temperature

F. Probabilistic prediction of abnormal temperature fluctua-tion by novel formulation of Hidden Markov Model(HMM)

Fig. 8. Standard Normal Density function of continuously collected temper-ature data with mean µ = 0 and standard deviation σ = 0.092

The real time acquisition of temperature data of sevenconsecutive days (20, 000 data points approximately at aninterval of every 30 sec), collected by sensors(motes) placedon racks of a DC was analyzed to develop a model to predictevents (PED). We performed Jarque-Bera [29] test to check ifthe null hypothesis [30] that the data points are from a normaldistribution holds. The observed p-value [30] was 0.20 whenthe level of significance(α) [30] was set to 5 percent. Sincethe observed p-value is more than α(0.05), so we accept thenull hypothesis.The graphical view of the data belonging to anormally distributed cluster is depicted in Figure 8.By developing a HMM [31] based model it was possibleto probabilistically predict sudden rise or fall in temperaturebased on the current temperature value. In DC scenario this

451

Page 7: [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA) - Fukuoka, Japan (2012.03.26-2012.03.29)] 2012 IEEE 26th International Conference

sudden rise or fall in temperature are identified as events.HMM is a Markov process typically identified by a setof hidden states and observables. Transition among hiddenstates are governed by a different set of probabilities calledtransition probabilities. Every hidden state has an observableoutput which is emitted with known conditional probabilitydistribution called the emission probability distribution.In DC temperature monitoring hidden states are based ontemperature range. We define the hidden states as FREEZE (mdegree Celsius to (n-1) degree Celsius), NORMAL (n degreeCelsius to (o-1) degree Celsius, HOT (o degree Celsius to pdegree Celsius). A typical DC should operate in the NORMALrange of temperature, transition to FREEZE or HOT statesdefinitely indicates over-cooling (wastage of energy) or over-heating (potential threat of fire) respectively.Now we first define transition probability between hiddenstates which is more important scenario as it is a direct indi-cator of chances of sudden rise or fall in absolute temperature.Let X be a random variable representing temperature. Letxi represent absolute temperature at time instance i. Thenxi ∈ X,∀i ∈ 1, 2, 3, ....n. The current temperature at timet is xt and temperature at time instance (t-1) will be xt−1.Suppose the temperature now is in NORMAL state and we aretrying to predict its probability to transit to either FREEZE orHOT state. If xt > xt−1 then there is a chance the temperaturemay lead to HOT state as a temperature rise is observed.Wecalculate rate of increase as (xt - xt−1)/xt−1. We now predictthe temperature at time (t+1) if the same rate of increasecontinues as:

xt+1 = xt(1 + (xt − xt−1)/xt−1) (1)

We compute the z-score corresponding to this predicted valueas:

zt+1 = (xt+1 − µ)/σ. (2)

Let the area under the normal curve between +zt+1 and −zt+1

be Pr percent. The transition probability is Pr percent. Sowith a probability of Pr the system will transit to HOT statefrom NORMAL state (current state). Such probability canalso be computed for probabilistic transition from NORMALstate to FREEZE state if a fall in temperature is observed i.e.xt < xt−1.So this probabilistic event prediction facilitates concernedpeople to take some evasive action to avoid such abnormal rateof temperature overflow or under flow by pushing in more orless cold air through HVAC [32] system or similar preventivemeasures can be taken.Since the states are hidden we do not observe them but whatwe observe is absolute temperature change (rise or fall). Foreach of the possible hidden states there exists a set of emissionprobabilities. Emission probabilities define the distribution ofobserved variable at a time based on the hidden variablestate at that particular time. The question framed accordingto our scenario of computing emission probabilities can be:How likely is the current temperature value (xt) to be in theparticular state it should be in ideal scenario? (How likely is

26 °C to be in Normal range? (pre knowledge of Normal range22°C -27°C) For example if xt = µ, then it is best likely tobe in the known state, as area under the bell curve is zero atµ, so probability is 1, so at temperature value xt, we calculatethe z-score in normal curve as:

z = (xt − µ)/σ (3)

Corresponding to this z score there will be some percentagearea under the curve say pe percent. We claim with a proba-bility of (1-pe) it belongs to that state.

G. Practical Case Study

Since the paper is targeted towards server room monitoring,we assume the NORMAL range as ASHRAE recommendedlevel for DCs which is 22°C to 27°C. We defined HOTstate to be a wide range over 27°C up to 80°C. We decidedthe FREEZE state to be a range below 22°C down to 0°C.We observed temperature of a rack in our DC to be 25degree Celsius on June 3rd,2011 at 5:00 P.M. in the evening.(within NORMAL range) The temperature at the next observedinstance at 5:10 P.M. was 25.8°C. (The sensors are sampledat an interval of 10 minutes to maximize battery life).Since we observe a rise in temperature then we are interestedto compute the chance of migrating to HOT state. The rate ofincrease is computed as 0.032. We now predict the temperatureat the next instance if this rate of increase in temperaturecontinues. It is calculated to be 26.63°C (refer to ”(1)”).The mean and standard deviation of the NORMAL range are24.5°C (since range is 22-27°C) and 0.56 (standard deviationcomputed from real time data logs)respectively. The z-scoreis computed as 3.8. The area under the standard normal curvebetween z-value of +3.8 and −3.8 is 0.998. So there is 99.8percent chance the temperature will violate NORMAL rangeand transit to HOT state at the next instance. So based on thisprobabilistic prediction concerned people can take necessaryaction to avoid such scenario. In our server room monitoringscenario even high chance of transition from NORMAL toFREEZE state was also observed which indicates chances ofsaving energy by turning off unnecessary cooling.Now we compute the emission probability of the data observedon 3rd June,2011 at 5:10 P.M. The z-score will be (25.8 −24.5)/0.56 = 2.32 The are under the standard normal curvebetween +2.32 and−2.32 is 0.9796. So with a probabilityof 0.0204 (1 - 0.9796) the temperature is likely to belong toNORMAL state.

H. Scalability Metrics

After the deployment of this system in more than 100 DCs,the number of packets received at the server per day is morethan 3,000,000 packets. The other DCM system metrics arelisted below in Table III.

V. CONCLUSION

The complex solutions for data center are not cost effec-tive for small enterprises whose business is spread acrossgeography. The manual monitoring of small DC may lead to

452

Page 8: [IEEE 2012 IEEE 26th International Conference on Advanced Information Networking and Applications (AINA) - Fukuoka, Japan (2012.03.26-2012.03.29)] 2012 IEEE 26th International Conference

TABLE IIISCALABILITY METRICS

S.No Parameter Number1 Data Centers 1022 motes / DC 4 to 163 Frequency of data sensing 1 sec4 Frequency of data sending to server 30 sec5 Packets / min at server 2,000 pkts6 Packets / day at server 3,000,000 pkts7 Avg alerts per day 208 Users 1000

human error, loss of data and wastage of electricity. Throughhundreds of deployment in different cities we have demon-strated that our proposed DCM solution can reduce the manualmonitoring costs by 80% and gives provision to operate theDCs in high operating levels by having the online monitoringsystem and prediction of real time alerts. With this system,potential energy savings of 10% and corresponding carbonemissions reduction are also observed. The effect of outsideweather conditions on the chiller level and HVAC set pointtemperatures is considered in the future work. The presentversion of the system is capable of monitoring and notifyingalerts to the concerned people. The feedback loop is plannedin future work with actuation capability to set the HVAC basedon real time ambient parameters.

ACKNOWLEDGMENT

The authors would like to thank Jayraj Ugarkar, LakshyaMalhotra and Sougata Sen for their help and valuable inputsduring the system development and deployment. We wouldalso like to thank Arun Agrahara Somasundara and ChinmoyMukherjee for their review comments.

REFERENCES

[1] R. Schmidt, M. Iyengar, and R. Chu, “Meeting Data Center TemperatureRequirements,” ASHRAE Journal, April 2005.

[2] (2010) www.synapsense.com. [Online]. Available:http://www.synapsense.com

[3] Report from avtech. protect your it facility. [Online]. Available:http://www.avtech.com

[4] K. Langendoen, A. Baggio, and O. Visser, “Murphy loves potatoes:Experiences from a pilot sensor network deployment in precisionagriculture,” in Parallel and Distributed Processing Symposium, 2006.IPDPS 2006. 20th International. IEEE, 2006, p. 8.

[5] X. Wang, G. Xing, Y. Zhang, C. Lu, R. Pless, and C. Gill, “Integratedcoverage and connectivity configuration in wireless sensor networks,” inProceedings of the 1st international conference on Embedded networkedsensor systems. ACM, 2003, pp. 28–39.

[6] S. Meguerdichian, F. Koushanfar, M. Potkonjak, and M. Srivastava,“Coverage problems in wireless ad-hoc sensor networks,” in INFOCOM2001, vol. 3. IEEE, 2002, pp. 1380–1387.

[7] S. Kumar, T. Lai, and J. Balogh, “On k-coverage in a mostly sleep-ing sensor network,” in Proceedings of the 10th annual internationalconference on Mobile computing and networking. ACM, 2004, pp.144–158.

[8] A. Howard, M. Mataric, and G. Sukhatme, “Mobile sensor networkdeployment using potential fields: A distributed, scalable solution to thearea coverage problem,” Distributed autonomous robotic systems, vol. 5,pp. 299–308, 2002.

[9] R. Szewczyk, E. Osterweil, J. Polastre, M. Hamilton, A. Mainwaring,and D. Estrin, “Habitat monitoring with sensor networks,” Communica-tions of the ACM, vol. 47, no. 6, pp. 34–40, 2004.

[10] T. Wark, C. Crossman, W. Hu, Y. Guo, P. Valencia, P. Sikka, P. Corke,C. Lee, J. Henshall, K. Prayaga et al., “The design and evaluation ofa mobile sensor/actuator network for autonomous animal control,” inIPSN 2007. ACM, 2007, pp. 206–215.

[11] R. Szewczyk, A. Mainwaring, J. Polastre, J. Anderson, and D. Culler,“An analysis of a large scale habitat monitoring application,” in Pro-ceedings of the 2nd international conference on Embedded networkedsensor systems. ACM, 2004, pp. 214–226.

[12] G. Werner-Allen, J. Johnson, M. Ruiz, J. Lees, and M. Welsh, “Monitor-ing volcanic eruptions with a wireless sensor network,” in Proceeedingsof the Second European Workshop on Wireless Sensor Networks, 2005.IEEE, 2005, pp. 108–120.

[13] M. Suzuki, S. Saruwatari, N. Kurata, and H. Morikawa, “A high-densityearthquake monitoring system using wireless sensor networks,” in Pro-ceedings of the 5th international conference on Embedded networkedsensor systems. ACM, 2007, p. 374.

[14] B. Arrue, A. Ollero, and J. Matinez de Dios, “An intelligent systemfor false alarm reduction in infrared forest-fire detection,” IntelligentSystems and their Applications, IEEE, vol. 15, no. 3, pp. 64–73, 2002.

[15] E. Jovanov, A. Milenkovic, C. Otto, and P. De Groen, “A wireless bodyarea network of intelligent motion sensors for computer assisted physicalrehabilitation,” Journal of NeuroEngineering and Rehabilitation, vol. 2,no. 1, p. 6, 2005.

[16] (2010) www.federspiel.com. [Online]. Available:http://www.federspielcontrols.com

[17] V. Handziski, A. Kopke, A. Willig, and A. Wolisz, “TWIST: a scalableand reconfigurable testbed for wireless indoor experiments with sensornetworks,” in Proceedings of the 2nd international workshop on Multi-hop ad hoc networks: from theory to reality. ACM, 2006, pp. 63–70.

[18] A. Mandal, C. Lopes, T. Givargis, A. Haghighat, R. Jurdak, and P. Baldi,“Beep: 3D indoor positioning using audible sound,” in CCNC. 2005Second IEEE. IEEE, 2005, pp. 348–353.

[19] M. Pan, C. Tsai, and Y. Tseng, “Emergency guiding and monitoringapplications in indoor 3D environments by wireless sensor networks,”International Journal of Sensor Networks, vol. 1, no. 1, pp. 2–10, 2006.

[20] R. Sharma, C. Bash, C. Patel, R. Friedrich, and J. Chase, “Balance ofpower: Dynamic thermal management for internet data centers,” InternetComputing, IEEE, vol. 9, no. 1, pp. 42–49, 2005.

[21] J. Moore, J. Chase, P. Ranganathan, and R. Sharma, “Making schedul-ing cool: Temperature-aware workload placement in data centers,” inProceedings of the annual conference on USENIX Annual TechnicalConference. USENIX Association, 2005, p. 5.

[22] B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, C. Kesselman,S. Meder, V. Nefedova, D. Quesnel, S. Tuecke, and I. Foster, “Secure,efficient data transport and replica management for high-performancedata-intensive computing,” in MSS 2006. IEEE, 2006, p. 13.

[23] C. Liang, J. Liu, L. Luo, A. Terzis, and F. Zhao, “RACNet: a high-fidelity data center sensing network,” in Proceedings of the 7th ACMConference on Embedded Networked Sensor Systems. ACM, 2009, pp.15–28.

[24] Understanding Fire Hazards in Computer rooms and Data Centers.[Online]. Available: http://www.verst.com.au

[25] Afcon control and automation system. [Online]. Available:http://www.jacarta.com

[26] interseptor environmental monitoring system. [Online]. Available:http://www.jacarta.com

[27] S. P. Kumar Padmanabh, Sunil K Vuppala, “MOJO: A Middleware thatconverts Sensor Nodes into Java Objects,” in IEEE CON-WIRE, 2010,Zurich, Switzerland. IEEE.

[28] U. Report from Energy System Research Unit, University of Strathclyde.Electricity consumption and carbon dioxide. [Online]. Available:http://www.esru.strath.ac.uk

[29] C. Jarque and A. Bera, “Efficient tests for normality, homoscedasticityand serial independence of regression residuals,” Economics Letters,vol. 6, no. 3, pp. 255–259, 1980.

[30] R. Nickerson, “Null hypothesis significance testing: a review of an oldand continuing controversy.” Psychological methods, vol. 5, no. 2, p.241, 2000.

[31] B. Juang, “Hidden markov models,” 1985.[32] Q. Bi, W. Cai, Q. Wang, C. Hang et al., “Advanced controller auto-tuning

and its application in hvac systems,” Control Engineering Practice,vol. 8, no. 6, pp. 633–644, 2000.

453


Recommended