Post on 14-Dec-2015
transcript
Advanced I/O Techniques for Efficient and Highly Available
Process Crash Recovery Protocols
Thesis Presentation
Jason Cornwell03/15/2011
Agenda
• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work
Computing Intensive Applications
Network Centric Services
Recent Advances
Motivation & GoalsDemand for more computing power and
high-bandwidth network connections
Advances in Microprocessors and Networks
Parallel Computing
Performanceand
Scalability
Reliabilityand
Availability
Simplicityand
Accessibility
Agenda
• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work
Reliability Problems
Large numbers of CPUs, Memory Modules, Hard Disk Drives, Network
Interfaces, Network Switches
Low Mean-Time-To-Failure (MTTF)and/or
High Failure-In-Time (FIT)
Classification of Failure• Transient Failure
– Power glitch– System patch and reboot– ECC trap
• Partial “Permanent” Failure– Disk failure– Partial network failure
• Wholesale “Permanent” Failure– Total hardware failure– Natural disaster
Availability Problems
Large numbers Processes, Threads, Software Barriers, Busy Waiting
Temporarily Unresponsiveand/or
Unavailable
Agenda
• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work
Possible Solutions
• Transient Failure– Restart/replay/resume on the same node– Task-migration is possible
• Permanent Partial Failure– Rebalance the workload on surviving nodes– Partial task-migration is needed
• Permanent Wholesale Failure– Reconfigure the applications and services– Massive task-migration to new platform
Checkpointing
• Common feature in high-performance computing (HPC) platforms
• Saves the execution state
• Application or system-level
• Mechanism for task migration
Application vs System Level
• Application-level Recovery Point– Developed application specific– Generally smaller footprint– Data accessiblity restrictions
• Kernel-level Recovery Point– Snapshot processes– Full resource restoration– Flexibility due to system level preemption
Berkeley Labs Checkpoint/Restart
• System-level
• Kernel-module
• Checkpoint creation implemented
• Process recovery implemented
• Linked to BLCR libraries at execution
• Stores checkpoint data locally (stack, heap, registers, signals, etc.)
Agenda
• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work
Contribution
• Enhanced BLCR performance through latency tolerant technique
• Increased BLCR availability through novel checkpoint creation technique
I/O Optimization
• Avoided extreme modification to BLCR
• Reduce the disk latency of checkpoint creation
• Implemented a caching technique
• Improved I/O performance 4-fold or more
• System overhead less than 300KB in experimental test results
Checkpoint Caching
• Buffer used as temporary storage
• Storage block flushed in large volume
• Trade-off between resource consumption and improved I/O efficiency
cr_copy(chkptData, count)
if(chkptBuf is NULL)
kmalloc size of count for chkptBuf space;
copy chkptData into chkptBuf;
else
kmalloc size of count + chkptBuf size for tempBuf space;
copy chkptBuf into tempBuf;
krealloc chkptBuf for its expanded size;
memmove tempBuf into chkptBuf;
kfree memory for tempBuf;
end if
Optimized Write Operation
Remote Checkpoint
• BLCR is limited to local disk storage
• Remote checkpoint offers off-site storage option
• Uses sockets to transmit data
• Needs predefined destination
• Outperforms BLCR in some experimental tests
Remote Checkpoint Server
• Single thread daemon• Used GCC compiler• Stores the recovery
point external to the client node
• Could be ported to Microsoft derivative
while(true)
create socket;
bind to address;
listen for incoming connections;
wait for client to connect;
create file descriptor;
while(data buffered received)
write checkpoint data;
close file descriptor;
close socket;
Modified Write Operation
• TCP packets• MTU must be
reached before delivery
• Only modification is to the write operation of BLCR
if(remote chkpt)
if(socket is NULL)
create socket;
establish connection, if handshake fails break and perform the original_chkpt;
end if
package checkpoint data;
send data message;
end if
if(original_chkpt)
original BLCR write operation;
end if
Agenda
• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work
Design
I/O Optimization Write
write(chkptData, count)
if(chkptBuf has space for the incoming chkptData)
cr_copy(ckptData, count);
else
vfs_write(chkptBuf);
vfs_write(chkptData);
kfree(chkptBuf);
end if
Remote Checkpoint Write
Agenda
• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work
Experimental Setup
I/O Optimization
• Dell Workstation, 3.06 GHz Intel Pentium 4, 1 GB Memory, 5,400 RPM Hard Disk, Linux 2.6
• BLCR Implementation• Optimized BLCR (O-BLCR)
Implementation
Remote Checkpoint
• Dell PowerEdge 700, 2.80 GHz Dual-processor Intel Pentium 4, 3 GB Memory, 5,400 RPM Hard Disk, Linux 2.6
• Dell Workstation, 3.06 GHz Intel Pentium 4, 1 GB Memory, 5,400 RPM Hard Disk, Linux 2.6
• BLCR Implementation• BLCR with NFS (BLCR+NFS)• BLCR with our Remote
Checkpoint Technique (BLCR+R)
Benchmarks
Program
• NP-Complete• Data Encryption• Linear Equation Solver• File Compression
Resource Utilization
Benchmark CPU Memory I/O
TSP High Low Low
AES High Low Medium
GE Low High High
HC Medium Medium Medium
I/O Optimization Results
Remote Checkpoint Results
Agenda
• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work
Conclusion
• Minimal modification to BLCR
• I/O optimization technique reduced the write latency of BLCR
• Remote checkpoint increases BLCR availability with new feature
• These techniques should be deployed into the foundation of BLCR source code
Agenda
• Introduction• Challenges• Pertinent Background• Proposed Techniques• Implementations• Experimental Setup & Results• Conclusions• Future Work
Future Work
• Server authentication protocol
• Data packet encryption
• Automated process load balancing
Questions