© 2016 PT&C Forensic Consulting Services, P.A.
Expert Tips for Investigating IT Equipment Failures
Tom BonseProject Manager
Jared FeganProject Consultant
© 2016 PT&C Forensic Consulting Services, P.A.
Our Background in Technology
• Electrical Engineers• Mechanical Engineers• Computer/Network Hardware Experts• Software Engineers• Network Security Experts• EnCE-certified Computer Forensics Experts• Certified Ethical Hackers• IPC certified application specialists• Various Microsoft certifications• Various other certifications
© 2016 PT&C Forensic Consulting Services, P.A.
The Course Agenda• Starting the claims investigation• What are common types of IT equipment?
• Components of this equipment
• Common Perils• Evaluation and testing of the equipment• Hard Disk Drives and common failure modes• What is equipment restoration?
• Science and feasibility• Techniques used
• Case studies• Q&A - Conclusion
© 2016 PT&C Forensic Consulting Services, P.A.
What Makes IT Equipment Losses Unique?
• Increased Business Interruption/Extra Expense
• Multiple Stakeholders• Warranty Issues• Certification Issues• Privacy Issues• Different Manufacturer
Philosophies
© 2016 PT&C Forensic Consulting Services, P.A.
Starting the claim process
• Identifying the equipment• Understanding the claimed event• Determining if this event is plausible• Supporting OR differing data regarding the
events of the loss• Examination of the equipment claimed• Determination of the cause of the claimed
event.
© 2016 PT&C Forensic Consulting Services, P.A.
Approaching A Server Loss
• Types of Server and Storage Equipment
Rack Mount Server
Tower ServerNAS / SAN / File Server
Storage Array
© 2016 PT&C Forensic Consulting Services, P.A.
Approaching a Point of Sale (PoS) Loss
• Sales Terminals• Peripherals –
Printers/Scanners etc.
• Backend server• Software
© 2016 PT&C Forensic Consulting Services, P.A.
Approaching a Dental/Medical Loss
• Equipment upgrades • Software Capability
© 2016 PT&C Forensic Consulting Services, P.A.
Interior of a Rack Mount Server
Power Supply
RedundantArrayOf
IndependentDisksController
Processors
Motherboard
© 2016 PT&C Forensic Consulting Services, P.A.
Interior of a Tower Server
Processor
Power Supply
RedundantArrayOf
IndependentDisksController
SCSI BackplaneMotherboard
© 2016 PT&C Forensic Consulting Services, P.A.
Servers within servers?• Software based servers that reside on the same physical piece
of equipment.– Referred to as Virtual Machine or as a Virtual Server.A virtual machine (VM) is an isolated software container that can run its own operating systems (OS) and applications as if it were a physical computer. A virtual machine behaves exactly like a physical computer and contains it own virtual central processing unit (CPU), random access memory (RAM), virtual hard disk drives (HDD) and a network interface card (NIC). The VM shares the resources (RAM, CPU, storage space) of the physical server based on the configuration when the VM is created.
– No real limit to number that can be on same equipment (limited by resources).
– Shared resources (processing, memory etc.)– Appears to the equipment as if it is located on its own server.
© 2016 PT&C Forensic Consulting Services, P.A.
Four VMs on a single server
Why is asking if VM’s are present important?
© 2016 PT&C Forensic Consulting Services, P.A.
What is RAID?• Redundant Array of Independent Disks
OR• Redundant Array of Inexpensive Disks
RAID = RAID is a technology that employs the simultaneous use of two (2) or more hard disk drives to achieve greater levels of performance, reliability or redundancy, and/or larger data volume sizes.
© 2016 PT&C Forensic Consulting Services, P.A.
What is a RAID Controller?
• A RAID controller is a device which manages the physical hard disk drives and presents them to the computer as logical units. It almost always implements hardware RAID and often provides additional disk cache.
• Many RAID Controllers require uninterrupted power for the onboard memory. This is due to all of the configuration settings being stored on the random access memory (RAM).
• Not to be confused with a Host Bus Adapter (HBA) which has no RAID configuration abilities.
© 2016 PT&C Forensic Consulting Services, P.A.
Typical Types of RAID Arrays
• RAID 1RAID1, also known as a mirror, consists of a maximum of two (2) Hard Disk Drives (HDDs) that write the same data simultaneously. This provides fault tolerance from hard disk drive errors and/or failures that can be experienced during normal operation. This fault tolerance, in the event that no more than one (1) of the hard disk drive fails, will allow for the server to provide uninterrupted service by operating on the remaining hard disk drive.
• RAID 5A RAID 5 configuration requires a minimum of three (3) HDDs that write data across all the HDDs simultaneously. This provides fault tolerance in the event that no more than one (1) HDD experiences errors and/or failures during normal operation. In the event that no more than one (1) of the HDDs fails, the server will continue to provide uninterrupted service by operating on the remaining HDDs. While in operation, the RAID configuration will be identified as a degraded state. If a second HDD goes offline, the RAID5 will exceed the built in redundancy
© 2016 PT&C Forensic Consulting Services, P.A.
Failures Experienced Lightning Damage Power Surge Loss of Power to surrounding area, or Battery Backup (UPS) issues Cooling/Environmental Control System failure Equipment in high contamination area or exposed to contaminates Natural Disaster – Tornado, flooding, etc. Vandalism Hard Disk Drive Failures RAID Controller Failure Software Corruption Application Errors Human Errors
© 2016 PT&C Forensic Consulting Services, P.A.
Lightning
$686 billion in lightning losses each year29% had no lightning detected on loss dateInsurance Industry is left to determine the cause of the damage
© 2016 PT&C Forensic Consulting Services, P.A.
Types of Lightning Damage
The magnetic fields created by nearby lightning strikes induces a voltage in long cables (30 feet or longer) which can damage components at the ends of the cables.
Direct StrikeStrike hits property causing visible damage
Inductive Coupling
© 2016 PT&C Forensic Consulting Services, P.A.
How Lightning Damages
Lightning requires a point of entry into a piece of equipment (power, phone, or network cables…etc.)
Lightning will follow an electrical path inside the equipment
Lightning typically causes catastrophic and instantaneous damage
© 2016 PT&C Forensic Consulting Services, P.A.
Path of Power
• Examination of equipment will identify an electrical power path into the device.
• Uninterruptible Power Supply (UPS)
• Surge Protector• Power Strip or Power
Distribution Unit (PDU)• Power Supply
© 2016 PT&C Forensic Consulting Services, P.A.
Power Surge
A large sudden increase of voltage (or current). This would cause damage to the components in the power path.
Can be caused by events as major as a short circuit in the utility equipment or as subtle as a neighboring facility turning on an air conditioning unit.
© 2016 PT&C Forensic Consulting Services, P.A.
Lightning vs. Power SurgeLightning will provide a high amount of energy for a short period of time.Power Surge will provide a lower amount of energy than lightning for a longer amount of time
ENERGY TIME
© 2016 PT&C Forensic Consulting Services, P.A.
Equipment Examination• Examination of equipment will identify an
electrical path into the device.• There may not always be viewable damage,
especially with inductive coupling events. In some cases, the failed operation of a portion of the computer may be the extent of identifying an issue.SO…..How can we test to verify the root cause of damage if we cannot turn it on?
© 2016 PT&C Forensic Consulting Services, P.A.
Visual Inspection
Visual inspection can reveal problems such as degraded / leaking capacitors or other physical damage.
Critical components are missing.
© 2016 PT&C Forensic Consulting Services, P.A.
Component Level Equipment Testing
• Break down the device into subcomponents…..Power SupplyMotherboardRAMProcessorHard Disk DriveEtc……………………
© 2016 PT&C Forensic Consulting Services, P.A.
Verification of Operation
• Each of the items can be placed into a testing system and validated for proper operation
• The testing completed is above and beyond normal everyday operation. Therefore, it will discover a failure that might not be discovered during normal system operation.
• Extended testing in a controlled environment will assist with possible latent or triggered failures.
• In many cases, a forced logical error can be configured. Therefore, the level of operation and/or response from a device can be verified by this method.
• This testing, in conjunction with the mounted components can determine the exact mode of failure.
© 2016 PT&C Forensic Consulting Services, P.A.
Gathering of operational data• SATA & IDE HDDs have Self-Monitoring Analysis and Recording
Technology (S.M.A.R.T) logs. • SCSI and SAS HDDs have P/G and log files. • Servers can have management logs, event logs• Operating Systems have various system and application logs that
can be reviewed.• Networks may have a management system called Intelligent Platform
Management Interface (IPMI). This has the ability to track, identify, and respond to events of the network and associated devices.
• Each obtained log will provide the historical operational statistics that can be reviewed for errors or issues.
© 2016 PT&C Forensic Consulting Services, P.A.
ExaminationFollowing the gathering of the logs, etc…..• Powering of the equipment• Errors presented during the boot process• Software/Application errors presented
Focus on the subsystems to identify failures and causes of these failures. Review of all available data to recreate the incident on the date of loss.
© 2016 PT&C Forensic Consulting Services, P.A.
A Hard Disk Drive• Typical HDDs in today's computer
equipment:1. Small Computer System Interface (SCSI)2. Integrated Drive Electronics (IDE) 3. Serial Advanced Technology Attachment (SATA)
- Solid State Drive (SSD)4. Serial Attached SCSI (SAS)
© 2016 PT&C Forensic Consulting Services, P.A.
Internal Components of a HDD
Air Filter Packet
Actuator Axis or Head Stack
Magnet
Actuator
Platters
Spindle
Read/Write Heads
Actuator Arm
Preamp Chip
Ribbon Cable
© 2016 PT&C Forensic Consulting Services, P.A.
Actuator Arm
Read / Write Head
Recording Surface - Platter
Internal Components of a HDD
© 2016 PT&C Forensic Consulting Services, P.A.
Map of the Platter
(A) Track (B) Sector
(C) Block
(D) Cluster
A track is a circular path on the surface of a disk.
A sector can be thought of as a wedge-shaped area of a disk. The term sector, however, is more often used as a synonym for block.
The intersection of a track and a sector is called a block. These blocks are the smallest breakdown of a HDD. Block = 512 bytes.
A cluster is the logical amount of disk space that can be allocated to hold a file. Smallest size of a cluster is one (1) sector.
© 2016 PT&C Forensic Consulting Services, P.A.
Data Losses for Consideration
• Where was the data stored? (Storage Method)• What data was lost? (Database, images, documents etc.)• What kind of peril caused the data to be lost?
– Hardware error - component failure– Software error – software update, error in program– Environment – over temperature, natural disaster etc.– Power event – power loss corrupts data, configuration and software.– Human error – deletion of data, virtual machine, poor maintenance, incorrect
maintenance.• Professional Data Recovery Services is a means to recover data that is deemed
unrecoverable. – Typically between $800-$2,000.00 per drive for normal service.– Upcharge for faster service.– Each Virtual Machine (VM) is often treated as its own recovery which increases the
cost of data recovery.
© 2016 PT&C Forensic Consulting Services, P.A.
Reason for Drive Failures Degraded sectors…Bad sectors Head crash Intermittently failing read/write heads Damaged read/write heads Head position tracking issues Firmware (software) corruption Known issues
© 2016 PT&C Forensic Consulting Services, P.A.
Bad SectorsA bad sector is a common occurrence involving a sector of the hard disk drive (HDD) media failing either due to physical damage of the media, or a deterioration of the magnetic media on which information is stored. Because of this failure, information cannot be written to the given area of the drive media.
© 2016 PT&C Forensic Consulting Services, P.A.
Head Crash
A head crash is where a read/write head makes contact with the platter during operation. Because of the speed of rotation of the platters and the cross axial movement of the read/write heads, the event of the read/write heads touching the platter is often catastrophic. Evidence of this event is identified with concentric rings or arcs viewable on the surface of the platters within the insured’s HDD. This is where the read/write heads are actually scratching or removing the magnetic layer from the platter. In addition to this type of failure, the read/write head sustains irreparable damage due to the delicacy of the components.
© 2016 PT&C Forensic Consulting Services, P.A.
Types of Contamination
Fire examples: flammable gas leaks such as saline, electrical, flammable solvent fires, furnace
Chemical spills: burst chemical supply line, leaking gas pipe, operator chemical spill
Water spillage: burst process cooling water, burst deionized water or ultra-pure water pipes, condensation, roof leaks
Construction dust: ceiling tile installation/removal, concrete floor install/repair
© 2016 PT&C Forensic Consulting Services, P.A.
Contamination Perspective
•0.01µm ≤ Tobacco smoke ≤ 1.00 µm
© 2016 PT&C Forensic Consulting Services, P.A.
Effects of Contamination Including Corrosion
Contamination effects include:o Cosmetic damage, odor, cosmetic change, obscuration, mechanical
binding, short circuits/arcing, thermal dissipation, increased contact resistance and especially corrosion (see below ).
o Corrosion and Corrosiveso Water and water vapor (including humidity) combine with ions to form
corrosive acids. -> Example H20 + Cl = HCl (Hydrochloric Acid)o Avoid contact with zinc, brass, galvanized iron, aluminum, copper and
copper alloys since violent reactions occuro Elevated temperatures and humidity will cause the reactivity to accelerate.
Reducing the environmental influences, can reduce surface deterioration. o Corrosive Ions in Smoke
o Sulfates - From burning wood, cardboard, paper, etc.o Nitrates - From burning nylon carpets, drapes, and certain plasticso Chlorides - From burning plastics, such as PVC and electrical wiring
© 2016 PT&C Forensic Consulting Services, P.A.
Restoration of Contaminated EquipmentFeasibility of Restoration (Is restoration a viable option? )
Equipment loss professional should look at the following:• Heat damage, arcing, corrosion, physical damage• Conditions of loss site• Circumstances surrounding the equipment• Concerns regarding business interruption or extra expense• Surface wipe sample and surface conductivity test results
Restorable - No signs of corrosion, surface conductivity tests low
Not restorable - Contamination effects caused excessive corrosion, arcing and overheating
© 2016 PT&C Forensic Consulting Services, P.A.
Science of RestorationDOE Study Vs. IPC Standard
1. DOE study threshold 20µg/in2 of aggregate chloride equivalent.More suitable for manufacturing facilities, machine
shops etc.2. IPC J-Standard threshold is 10.06µg/in2 of aggregate
sodium chloride equivalent .More suitable for data centers, medical facilities etc.
© 2016 PT&C Forensic Consulting Services, P.A.
IT equipment Special Considerations
Special considerations:• BI costs can be enormous - quick
decisive action is essential• Contamination incidents - extremely
important to accurately determine the extent of contamination (Insured normally cannot do this). Expert + analytical lab services required to support assessment.
• A very good understanding of the manufacturing technologies, chemicals and gases used, as well as cleanroom and facilities experience and understanding.
• Lateral thinking for solving problems.
© 2016 PT&C Forensic Consulting Services, P.A.
Fact or Fiction?• All manufacturers condemn equipment if
contaminated by water or smoke? FICTIONIn fact:
• Siemens Medical along with Allianz started equipment restoration in the 70’s
• Third party service companies are utilized to perform the needed repairs and reinstate their service contracts.
• All electronic circuit boards are damaged if contaminated by water. FICTION In fact:
• Many electronic circuit boards are water resistant with conformal coating
• De-ionized water is an integral part in the manufacturing of electronic circuit boards
Provided by Aqueous Technologies
© 2016 PT&C Forensic Consulting Services, P.A.
Equipment Restoration Basics
• Can be completed on pieces affected by:• Smoke and soot• Water or excessive
humidity• Construction dust• Chemical contamination
• Restoration is a viable option if:• Minimal heat damage• Minimal arcing or short
circuiting• Low levels of chloride
corrosion/oxidation
Surface conductivity – Sodium Chloride (NaCl) (24.9 µg/cm2 = 160µg/in2)
Laboratory wipe sample test results from a 24-port switch indicate Sodium Chloride equivalent > 10.06 µg/in2
© 2016 PT&C Forensic Consulting Services, P.A.
Equipment Restoration Techniques
• Dry Techniques - HEPA vacuum, agitation with brushes, low pressure deionized compressed air.
• Modified Wet - A combination of dry techniques as well as hand detail utilizing aqueous spray solutions.
• Aqueous Wet - Thorough disassembly of power supplies, control circuitry and mechanicals assemblies. Aqueous wash utilizing deionized water and cleaning solutions.
• Overnight drying – where applicable, in a heated chamber to remove moisture. In addition, a vacuum chamber can be used to reduce enhance moisture removal and improve drying time.TekPro - Vacuum drying chamber
© 2016 PT&C Forensic Consulting Services, P.A.
Why Consider Restoration as a Strategy for a IT Equipment Claim?
1. Cost efficient Less than 30% of replacement costs in most
cases2. Reduces down time and BI losses Restoration can often times be completed faster
than replacement, especially with high end medical pieces
Reduces installation, configuration costs Removes need to train personnel on new
equipment
© 2016 PT&C Forensic Consulting Services, P.A.
Preserving Your Options
• Disconnect Power• Control Humidity• Use Temporary Barriers• Remove Excess Water• Consider a Preservative• Retain Professional Advice• Mechanical Preservation
- Lubricating Agents• Electronics Preservation
© 2016 PT&C Forensic Consulting Services, P.A.
Case Study 1
CLAIM: Over-temperature event causes servers to shutdown and now will not reboot properly
• Obtaining the logs from the HDDs provided no thermal temperature were exceeded. Subsequent testing provided no errors or failures.
• Server logs identify thermal event (reached high limit).• Thermal event triggered thermal shutdown of the server in order to protect the
server from damage.– Now changes the temperature specifications of the server for storage
• Testing of the hardware separate from the software provides full functionality.• Shutdown was not graceful, instead was abrupt, which caused damage to OS
files.• Server does not need to be replaced.• OS needs reloaded, server is restorable. Database is not damaged and can
be fully restored to use.
© 2016 PT&C Forensic Consulting Services, P.A.
Case Study 2CLAIM: Multiple hard disk drives (HDDs) failure on a server
– Server experienced multiple HDD failures simultaneously– RAID controller failure provided inconsistencies reporting the HDDs– HDDs sent to data recovery facility, identified that the HDDs were not physically
damaged– Obtained the server for testing– Identified HDDs are fully operational– Server main components are fully operational– No logs on the server or RAID controller– Put the server and HDDs together and tested
• Reviewed the RAID controller• Found that the RAID controller was reporting the physical HDDs inconsistently
– Physical drive – two (2) HDDs offline– Logical drive – one online, second offline– Rebooted – All online and system RAID was Optimal– OS is corrupted and requires reloading
© 2016 PT&C Forensic Consulting Services, P.A.
Case Study 3
CLAIM: Tornado damaged buildings including server room full of servers and other network equipment.
Inventory obtained for claimed equipmentVisual Inspection of claimed equipment including internal inspectionWipe samples of contaminates to obtain lab results to see if equipment is restorableTesting of equipment to provide if units are functional/operationalSend wipe samples and obtain lab resultsShare lab results, restoration options with insured and clients to inform as well as obtain buy
in to the processProvide list of equipment that needs to be replaced and what can be restoredProvide quotes for restoration and replacement equipmentBegin Restoration and turn key approach to restoring insured’s equipment to pre-loss
conditionInsured’s operation restored before critical time period with all parties satisfied with results.
© 2016 PT&C Forensic Consulting Services, P.A.
919.328.0793
Questions?Jared Fegan