Advanced I/O Techniques for Efficient and Highly Available Process Crash Recovery Protocols Thesis...

transcript

Advanced I/O Techniques for Efficient and Highly Available

Process Crash Recovery Protocols

Thesis Presentation

Jason Cornwell03/15/2011

Agenda

• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work

Computing Intensive Applications

Network Centric Services

Recent Advances

Motivation & GoalsDemand for more computing power and

high-bandwidth network connections

Advances in Microprocessors and Networks

Parallel Computing

Performanceand

Scalability

Reliabilityand

Availability

Simplicityand

Accessibility

Agenda

Reliability Problems

Large numbers of CPUs, Memory Modules, Hard Disk Drives, Network

Interfaces, Network Switches

Low Mean-Time-To-Failure (MTTF)and/or

High Failure-In-Time (FIT)

Classification of Failure• Transient Failure

– Power glitch– System patch and reboot– ECC trap

• Partial “Permanent” Failure– Disk failure– Partial network failure

• Wholesale “Permanent” Failure– Total hardware failure– Natural disaster

Availability Problems

Large numbers Processes, Threads, Software Barriers, Busy Waiting

Temporarily Unresponsiveand/or

Unavailable

Agenda

Possible Solutions

• Transient Failure– Restart/replay/resume on the same node– Task-migration is possible

• Permanent Partial Failure– Rebalance the workload on surviving nodes– Partial task-migration is needed

• Permanent Wholesale Failure– Reconfigure the applications and services– Massive task-migration to new platform

Checkpointing

• Common feature in high-performance computing (HPC) platforms

• Saves the execution state

• Application or system-level

• Mechanism for task migration

Application vs System Level

• Application-level Recovery Point– Developed application specific– Generally smaller footprint– Data accessiblity restrictions

• Kernel-level Recovery Point– Snapshot processes– Full resource restoration– Flexibility due to system level preemption

Berkeley Labs Checkpoint/Restart

• System-level

• Kernel-module

• Checkpoint creation implemented

• Process recovery implemented

• Linked to BLCR libraries at execution

• Stores checkpoint data locally (stack, heap, registers, signals, etc.)

Agenda

Contribution

• Enhanced BLCR performance through latency tolerant technique

• Increased BLCR availability through novel checkpoint creation technique

I/O Optimization

• Avoided extreme modification to BLCR

• Reduce the disk latency of checkpoint creation

• Implemented a caching technique

• Improved I/O performance 4-fold or more

• System overhead less than 300KB in experimental test results

Checkpoint Caching

• Buffer used as temporary storage

• Storage block flushed in large volume

• Trade-off between resource consumption and improved I/O efficiency

cr_copy(chkptData, count)

if(chkptBuf is NULL)

kmalloc size of count for chkptBuf space;

copy chkptData into chkptBuf;

kmalloc size of count + chkptBuf size for tempBuf space;

copy chkptBuf into tempBuf;

krealloc chkptBuf for its expanded size;

memmove tempBuf into chkptBuf;

kfree memory for tempBuf;

end if

Optimized Write Operation

Remote Checkpoint

• BLCR is limited to local disk storage

• Remote checkpoint offers off-site storage option

• Uses sockets to transmit data

• Needs predefined destination

• Outperforms BLCR in some experimental tests

Remote Checkpoint Server

• Single thread daemon• Used GCC compiler• Stores the recovery

point external to the client node

• Could be ported to Microsoft derivative

while(true)

create socket;

bind to address;

listen for incoming connections;

wait for client to connect;

create file descriptor;

while(data buffered received)

write checkpoint data;

close file descriptor;

close socket;

Modified Write Operation

• TCP packets• MTU must be

reached before delivery

• Only modification is to the write operation of BLCR

if(remote chkpt)

if(socket is NULL)

create socket;

establish connection, if handshake fails break and perform the original_chkpt;

end if

package checkpoint data;

send data message;

end if

if(original_chkpt)

original BLCR write operation;

end if

Agenda

Design

I/O Optimization Write

write(chkptData, count)

if(chkptBuf has space for the incoming chkptData)

cr_copy(ckptData, count);

vfs_write(chkptBuf);

vfs_write(chkptData);

kfree(chkptBuf);

end if

Remote Checkpoint Write

Agenda

Experimental Setup

I/O Optimization

• Dell Workstation, 3.06 GHz Intel Pentium 4, 1 GB Memory, 5,400 RPM Hard Disk, Linux 2.6

• BLCR Implementation• Optimized BLCR (O-BLCR)

Implementation

Remote Checkpoint

• Dell PowerEdge 700, 2.80 GHz Dual-processor Intel Pentium 4, 3 GB Memory, 5,400 RPM Hard Disk, Linux 2.6

• Dell Workstation, 3.06 GHz Intel Pentium 4, 1 GB Memory, 5,400 RPM Hard Disk, Linux 2.6

• BLCR Implementation• BLCR with NFS (BLCR+NFS)• BLCR with our Remote

Checkpoint Technique (BLCR+R)

Benchmarks

Program

• NP-Complete• Data Encryption• Linear Equation Solver• File Compression

Resource Utilization

Benchmark CPU Memory I/O

TSP High Low Low

AES High Low Medium

GE Low High High

HC Medium Medium Medium

I/O Optimization Results

Remote Checkpoint Results

Agenda

Conclusion

• Minimal modification to BLCR

• I/O optimization technique reduced the write latency of BLCR

• Remote checkpoint increases BLCR availability with new feature

• These techniques should be deployed into the foundation of BLCR source code

Agenda

Future Work

• Server authentication protocol

• Data packet encryption

• Automated process load balancing

Questions

Advanced I/O Techniques for Efficient and Highly Available Process Crash Recovery Protocols Thesis...

Documents