Date post: | 06-Jan-2018 |
Category: |
Documents |
Upload: | catherine-lynch |
View: | 226 times |
Download: | 0 times |
MP-PIPE for Soybean Proteome
Brad Barnes27/11/15
COMP 5704
Problem High protein plant, grown in Canada
Contains over 70,000 different proteins Unsigned Short: 2^16 = 65534 Unable to use with PIPE
The Cluster 18 nodes
32 GB RAM 8 core processors 100 GB SSD
Source: www.dehne.net
PIPE Pipeline
1. Prepare data
2. Run genTab to build database
3. Run MP-PIPE to predict interactions
Memory
Error: Proc killed with Signal 9 Running out of memory
Top output:
Logging
Errors in regular PIPE: Process killed with signal 11 (Segmentation Fault)
Need logging to file!
Debugging
Debug in single threaded mode Attach gdb debugger to file
Trace error: to hash table lookup (with very long protein) Issue: very large protein sizes lead to integer overflow
Testing
“The principle objective of software testing is to give confidence in the software.” – Anonymous
Small datasets with known results
Large dataset for final test
Performance Tuning # threads vs speed up MP-PIPE
Source: (A. Schoenrock et al. 2011)
Version Control
Checkpoint results Work on different things
Version Control
Conclusion
Modified PIPE to work for Soybeans genTab limited by memory => doubled runtime MP-PIPE performance constant
Validated with tests
Added logging to file Fixed integer overflow issue
Questions
1. What was the issue with PIPE?
2. How were changes verified?
3. What’s one useful tool for software development