1
Backdoors: A Remote HealingArchitecture for Cluster-based
Systems
Florin SultanLaboratory for Network Centric Computing
http://discolab.rutgers.edu
Feb 12, 2004 COANS Seminar Spring 2004 5
…Not Good for All
Bank.com
TCP/IP
http://www.Bank.comWeb Browsing
Feb 12, 2004 COANS Seminar Spring 2004 6
What Do We Need?n Monitor system health
n OS/application failuresn DoS attack, overloadn intrusion
n Take action to heal the systemn repair damaged state, clean-up corrupted staten extract and recover good staten contain fault/attackn repel intrusion
n Where should these operations be performed?
Feb 12, 2004 COANS Seminar Spring 2004 7
Self-Healing
n Consumes processor cycles (intrusive)n Relies on processor availability
n hang failures make healing impossible
n Relies on OS resourcesn sensitive to resource depletion/unavailability
n Relies on system integrityn state may be corruptedn system may be compromised by an attacker
Feb 12, 2004 COANS Seminar Spring 2004 8
Alternative: Remote Healing
n Perform healing from another systemn target system must allow remote accessn the monitor system must be trusted
n Can we make remote healing nonintrusive?n no extra load on the target systemn no reliance on target resources (processor, OS, etc.)
Feb 12, 2004 COANS Seminar Spring 2004 9
Target Failures
n OS/application hangs or cannot sustain servicen hw: processor, network, disk, etc.n OS: driver bug, deadlock, resource exhaustion, etc.n DoS attack, overload
n Memory still available, yet not accessible viaconventional paths (IP stack, console, etc.)
n Solutionn monitor and detect failuresn recover or repair software state of the affected system
Feb 12, 2004 COANS Seminar Spring 2004 10
The Backdoor
backdoor: a hidden software or hardware mechanism, usually created for testing and troubleshooting
--American National Standard for Telecommunications
Feb 12, 2004 COANS Seminar Spring 2004 11
The Backdoor (BD) Architecture
Processor
I/Odevices
Memory
Backdoor MonitorSystem
Target System
Feb 12, 2004 COANS Seminar Spring 2004 12
Outline
n Introductionn Remote Healing in Clusters of Computersn Backdoor Architecturen Case Study: Recovery in Internet Servicesn Prototypen Conclusions
Feb 12, 2004 COANS Seminar Spring 2004 13
Internet Services Today
n Commercial shift in using the Internetn e-commerce, banking, trading, auctioning, etc.n transactional, time-critical servicesn economic incentive to fault tolerance and service continuity
Bank.comTCP/IP
http://www.Bank.comWeb Browsing
Feb 12, 2004 COANS Seminar Spring 2004 14
Cluster-based Internet Services
Server
Monitor
Server
Monitor
Server
Monitor
Client Client Client Client
Feb 12, 2004 COANS Seminar Spring 2004 15
Cluster-based Internet Services
Server
Monitor
Server
Monitor
Server
Monitor
Client Client Client Client
Feb 12, 2004 COANS Seminar Spring 2004 16
Remote Healing in Clusters
n Goal: survivability of live service staten OS and application-specific
n Target: state critical to service continuityn Remote monitoring and diagnosis
n detect failure, bad state, attack, intrusion
n Remote interventionn recovery of useful state from failed nodesn in-place repair of bad state
Feb 12, 2004 COANS Seminar Spring 2004 17
Backdoor-based Remote Healing
PM I/O
BD
PM I/O
BD
PM I/O
BD
PM I/O
BD
Private secure networkM
T M
M T
TT M
Feb 12, 2004 COANS Seminar Spring 2004 18
Backdoor Architecture Principles
1. Bidirectional accessn both remote input and output operations must be
supported2. Remote memory access
n memory must be accessible remotelyn remote I/O?
3. Availabilityn failure must not impair BD
4. Nonintrusive operationn BD operations must not involve processors of the
target system
Feb 12, 2004 COANS Seminar Spring 2004 19
Backdoor Architecture Principles (cont)
5. Transparencyn BD operation must not be visible to target
6. Access controln monitor and target negotiate access permissions at
the beginningn target cannot “close” the BD afterwards
7. Tamper resistancen target cannot modify the result of a BD operation
Question: How can we implement Backdoor usingexisting technologies?
Feb 12, 2004 COANS Seminar Spring 2004 20
OS
NIC
n Remote DMA (RDMA) Read/Write operationsn Remote processor not involvedn RMC-based networking technologies: VIA, InfiniBand, etc.
Remote Memory Communication (RMC)
CPU
CPUMemory
RMC NIC
CPUMemory
RMC NIC
RDMA Write
RDMA InitiatorTarget
Feb 12, 2004 COANS Seminar Spring 2004 21
Backdoor with RMC
NIC CPU
CPUMemory
RMC NIC
CPU
Memory
RMC NIC
MonitorTargetMONITOR(RDMA-R)
REPAIR/RECOVER(RDMA-R/W)
Feb 12, 2004 COANS Seminar Spring 2004 22
RMC Compliance with BD Principles
Y-Access control
Y?Transparency
YNonintrusiveness
YAvailabilityYRemote memory accessYBidirectional access
Tamper resistance Y
Feb 12, 2004 COANS Seminar Spring 2004 23
Remote Healing Architecture
Nonintrusive remote accessBD
OSMonitoring Repair /
Recovery Monitoring Repair /Recovery
Passive Gateway
T M
ActionDetectionCritical stateApplication
Active Gateway
API
Feb 12, 2004 COANS Seminar Spring 2004 24
Monitoring over RMC-BD
CPUMemory
RMC NIC
CPU
RMC NIC
Target Monitor
ExternalizedState
Memory
RemoteviewDetection
Monitor: progress, anomalous events, integrity constraints, etc.
Feb 12, 2004 COANS Seminar Spring 2004 25
Repair over RMC-BD
CPUMemory
RMC NIC
CPU
RMC NIC
Target Monitor
Memory
CorrectStateRepaired
State
Feb 12, 2004 COANS Seminar Spring 2004 26
Recovery over RMC-BD
CPUMemory
RMC NIC
CPU
RMC NIC
Target Monitor
Memory
RecoverableState
RecoveredState
Feb 12, 2004 COANS Seminar Spring 2004 27
And Possibly More…
n Remote control of I/O devicesn access state in peripheral devices, e.g., OS swap space
n Dynamically inject code/data in a live systemn test, diagnosis, repair handlersn fast system reboot through OS memory overlayn fast restart of application components (micro-reboot)
n Monitor for intrusion/attack detection
Feb 12, 2004 COANS Seminar Spring 2004 28
Case Study: Recovery In Internet Services
n Remote healing is not just RMC!n RMC provides just a way of access
n Requires OS supportn Failure Detectionn Session Recovery
Feb 12, 2004 COANS Seminar Spring 2004 29
OS Support
n Monitoring: Progress Box (PB)n progress counter: {scalar value, update deadline}n PB = set of progress counters in OS memoryn API to allocate and update progress counters in PBn monitor reads PB, checks counters, detects stalls
n Recovery: State Box (SB)n encapsulates per-session server staten API to export/import application state to/from SBn backup node reads SB, reinstates session, resumes service
Feb 12, 2004 COANS Seminar Spring 2004 30
Failure Detection with PB
Target OS MonitorProgressBox
n Target system updates progress counters in PBn Examples: interrupts (global, per-device), context
switches, connections accepted, etc.n Monitor process
n scans remote PB, checks counters, detects stalls
BD
Feb 12, 2004 COANS Seminar Spring 2004 31
Recovery With SB
n Fine-grained, essential service staten Application-specific components (SB_APP)
n E.g., document name, offset in document, etc.n OS-specific components (SB_IO)
n E.g., send/receive TCP buffersn An SB can be distributed over multiple processes
(multi-tier servers)n Backup node extracts SB from a failed node and
reinstates it locally
Feb 12, 2004 COANS Seminar Spring 2004 32
SB Structure
C1
SB3
C2
C3SB2
SB1
TCP state pipe stateApp state App state
Front-end server process
Back-end server process SB_APP
SB_IO
Feb 12, 2004 COANS Seminar Spring 2004 33
Backdoors Prototype
n Implemented using Myrinet NICs with modifiedfirmwaren remote Read/Write DMAn remote OS locking (syscalls, interrupt handlers)
n Modified FreeBSD kerneln Progress Boxn State Box
n Modified server applications
Feb 12, 2004 COANS Seminar Spring 2004 34
A Realistic Sample Application:Multi-tier Auction Service (RUBiS)
Back-End
MySQL DB
Front-End (FE)
Apache web server
Middle Tier (MT)
Tomcat + JBoss
Feb 12, 2004 COANS Seminar Spring 2004 35
Recoverable RUBiS
SB = {reqID, req}
SB = {reqID, tid, result}
Feb 12, 2004 COANS Seminar Spring 2004 36
Experimental Evaluation
n 2.4 GHz, 1 GB RAM, 1Gbps Ethernet, MyrinetLanaiX 133 MHz PCI
n Fault injectionn synthetic freeze: halt CPU, disable device interrupts,
disable network interface, trap to kernel debuggern emulated crashes in buggy network drivers
n Experimentsn Microbenchmarksn Failover correctnessn Failover throughput and latency
Feb 12, 2004 COANS Seminar Spring 2004 37
Microbenchmarks
n Monitor CPU usage, sampling a 100-counter PBn 46% worst-case (infinite loop)n < 5% @ 10 ms, < 1% @ 100 msn High sampling rates possible
n Low overhead SB APIn export/import: < 30 usn extract + reinstate a 10 KB front-end SB: 358 us
Feb 12, 2004 COANS Seminar Spring 2004 38
Failure-free Overhead
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
20 100 300 500 700 900 1,100
Clients
Req
uests
/min
Base
Recoverable FE
Recoverable FE+MT
Feb 12, 2004 COANS Seminar Spring 2004 39
Failover Correctness
n Workload/run: 600 requests from 200 clientsn request = DB queries + DB table update
n Two correctness tests across crash & recoveryn End-to-end consistency (crash invisible to client)n Database integrity (exactly-once semantics preserved)
n All crash-test runs were validated
Feb 12, 2004 COANS Seminar Spring 2004 42
Related Work
n DEC WRL Titan system [Mogul ’86]n Recovery Box [Baker ‘93]n Rio reliable file cache [Chen ‘96]n Online OS reconfiguration [Soules ‘03]n Virtual machines [Bressoud ‘95, Dunlap ‘02]n Automatic repair of data structures [Demski ‘03]
Feb 12, 2004 COANS Seminar Spring 2004 43
Conclusions
n Backdoor: system architecture for nonintrusiveremote healingn monitoring without using processor cyclesn repair, recovery even when remote processor is not
available
n BD prototype for transparent recovery of activeservice sessions in cluster-based Internetservices
Feb 12, 2004 COANS Seminar Spring 2004 44
Current and Future Work
n Remote repair of OS staten OS support and API for healing-conscious
applicationsn programmer performs application-specific monitoring,
repair and recovery
n Securing the BDn low-level access control through BD Guard entities
implemented in firmware
n Remote control of I/O devices
Feb 12, 2004 COANS Seminar Spring 2004 45
The People Behind Backdoors
n Aniruddha Bohran Stephen Smaldonen Yufei Pann Iulian Neamtiu (Maryland)n Pascal Gallard (IRISA/INRIA)n Liviu Iftode