Date post: | 14-Apr-2017 |
Category: |
Science |
Upload: | berenice-batut |
View: | 194 times |
Download: | 0 times |
ASaiMLessonslearnedfromdevelopingaframeworkforbiologists
BéréniceBatut—October16th,2015
PhDthesisinbioinformaticsandcomputationalbiology
ContributiontoaevolprojectDevelopmentofsimplePythonscripts
Post-docinbioinformatics
DevelopmentofASaiMproject
ASaiMproject
ObjectivesDevelopmentofabioinformaticsenvironmenttoanalyze
datafromgutmicrobiota
Gutmicrobiota
Communityofmicroorganismspeciesthatliveinthedigestivetracts
"Forgotten"organ
Metagenomic:studyofmicrobiota
ComplexityShortsequencesSequencevariabilityUncompletereferencedatabases
Needfornumeroustreatmentstoextractusefulinformation
Extract id
16S sequence id
Extract id
18S sequence id
Extract id and e-value
16S e-value
Extract id and e-value
18S e-value
Compare similarity
16S similar id
Compare difference
16S specific id
Compare similarity
18S similar id
Compare difference
18S specific id
Compare similarity Compare similarity
18S similar e-value16S similar e-value
Join columns
16S and 18S similar e-value
Extract line where 18S e-value < 16S e-value
Extract line where 16S e-value < 18S e-value
E-value of 16S similar to conserve
E-value of 18S similar to conserve
Extract column corresponding to id Extract column corresponding to id
Id of 18S similar to conserve
Id of 16S similar to conserve
Concatenate
Id of 18S to conserve
Concatenate
Id of 16S to conserve
Extract sequences whose id in a list
16S sequences to conserve
Extract sequences whose id in a list
16S sequences to conserve
16S sequence id
Remove first line
16S e-value 18S e-value 18S sequence id
Remove first line Remove first line Remove first line
Input sequences
16S sequences 18S sequences
rRNA populus sequences
rRNA sclerotinia sequences
Silva bacteria 16S sequences
Silva archee 16S sequences
Silva eukaryota 18S sequences
SortMeRNA
Non populus sequencesPopulus sequencesPopulus blast report
Extact id for report with id > 97% and
coverage > 97%Extact id
Populus id Id
Compare difference
Populus conserved id
Extact id for report with id > 97% and target position = 1
Id
Compare difference
Populus conserved id
Extact id for report with id > 97% and target position > target sequence length
Id
Compare difference
Populus conserved id
Extract sequences whose id not in a list
Populus not conserved sequences
Concatenate
SortMeRNA
Non populus sequences
Non sclerotinia and non populus
sequences
SortMeRNA SortMeRNA
16S blast report 18S blast report
Exampleofworkflowtosortsequencesgiventheirtype
ASaiMframework
Bioinformaticsframeworktogenerateworkflowstoanalyzedatafromgutmicrobiota
MainRequirements
Generationofworkflowwithnumeroustools
Easytouse
Flexibility
Heavilyandeasilydocumented
Easytomaintain
FirsttestedapproachSimplePythonscripts
Fitwithframeworkrequirements?
Generationofworkflowwithnumeroustools
Easytouse
Flexibility
Heavilyandeasilydocumented
Easytomaintain
SecondtestedapproachWorflowmanagerssuchasLuigi,Airflow,...
Airflowdependencygraph(from )Airbnbsite
Fitwithframeworkrequirements?
Generationofworkflowwithnumeroustools
Easytouse
Flexibility
Heavilyandeasilydocumented
Easytomaintain
ThirdtestedapproachHomemadeapproach
Configurationfile
WorkflowdescriptionWebinterfaceforgeneration
Pythonscriptstoexecuteworkflowinconfigurationfile
Fitwithframeworkrequirements?
Generationofworkflowwithnumeroustools
Easytouse
Flexibility
Heavilyandeasilydocumented
Easytomaintain
MainissuewiththeseapproachesDependencybetweenthetasks
Airflowdependencygraph(from )Airbnbsite
Extract id
16S sequence id
Extract id
18S sequence id
Extract id and e-value
16S e-value
Extract id and e-value
18S e-value
Compare similarity
16S similar id
Compare difference
16S specific id
Compare similarity
18S similar id
Compare difference
18S specific id
Compare similarity Compare similarity
18S similar e-value16S similar e-value
Join columns
16S and 18S similar e-value
Extract line where 18S e-value < 16S e-value
Extract line where 16S e-value < 18S e-value
E-value of 16S similar to conserve
E-value of 18S similar to conserve
Extract column corresponding to id Extract column corresponding to id
Id of 18S similar to conserve
Id of 16S similar to conserve
Concatenate
Id of 18S to conserve
Concatenate
Id of 16S to conserve
Extract sequences whose id in a list
16S sequences to conserve
Extract sequences whose id in a list
16S sequences to conserve
16S sequence id
Remove first line
16S e-value 18S e-value 18S sequence id
Remove first line Remove first line Remove first line
Input sequences
16S sequences 18S sequences
rRNA populus sequences
rRNA sclerotinia sequences
Silva bacteria 16S sequences
Silva archee 16S sequences
Silva eukaryota 18S sequences
SortMeRNA
Non populus sequencesPopulus sequencesPopulus blast report
Extact id for report with id > 97% and
coverage > 97%Extact id
Populus id Id
Compare difference
Populus conserved id
Extact id for report with id > 97% and target position = 1
Id
Compare difference
Populus conserved id
Extact id for report with id > 97% and target position > target sequence length
Id
Compare difference
Populus conserved id
Extract sequences whose id not in a list
Populus not conserved sequences
Concatenate
SortMeRNA
Non populus sequences
Non sclerotinia and non populus
sequences
SortMeRNA SortMeRNA
16S blast report 18S blast report
Extract id
16S sequence id
Extract id
18S sequence id
Extract id and e-value
16S e-value
Extract id and e-value
18S e-value
Compare similarity
16S similar id
Compare difference
16S specific id
Compare similarity
18S similar id
Compare difference
18S specific id
Compare similarity Compare similarity
18S similar e-value16S similar e-value
Join columns
16S sequence id
Remove first line
16S e-value 18S e-value 18S sequence id
Remove first line Remove first line Remove first line
16S sequences 18S sequences
rRNA sclerotinia sequences
Silva bacteria 16S sequences
Silva archee 16S sequences
Silva eukaryota 18S sequences
SortMeRNA
Non populus sequences
Non sclerotinia and non populus
sequences
SortMeRNA SortMeRNA
16S blast report 18S blast report
FinalapproachGalaxy
Open-sourceprojectbasedonPythonInternationaldevelopmentcommunityWebinterfaceGalaxyToolShed
Fitwithframeworkrequirements?
Generationofworkflowwithnumeroustools
Easytouse
Flexibility
Heavilyandeasilydocumented
Easytomaintain
Galaxydependencygraph
Input dataset
output
Line/Word/Character count
Text file
out_file1
Extract (constrained) information
Similarity search report
report_filepathoutput_filepath
Extract (constrained) information
Similarity search report
report_filepathoutput_filepath
Line/Word/Character count
Text file
out_file1
Line/Word/Character count
Text file
out_file1
Remove beginning
from
out_file1
Remove beginning
from
out_file1
Line/Word/Character count
Text file
out_file1
Compare two Datasets
Compareagainst
out_file1
Join two Datasets
Joinwith
out_file1
Compare two Datasets
Compareagainst
out_file1
Line/Word/Character count
Text file
out_file1
Cut
From
out_file1
Line/Word/Character count
Text file
out_file1
Filter
Filter
out_file1
Filter
Filter
out_file1
Line/Word/Character count
Text file
out_file1
Cut
From
out_file1
Cut
From
out_file1
Line/Word/Character count
Text file
out_file1
Line/Word/Character count
Text file
out_file1
Cut
From
out_file1
Concatenate datasets
Concatenate DatasetDataset 1 > Select
out_file1
Concatenate datasets
Concatenate DatasetDataset 1 > Select
out_file1
Line/Word/Character count
Text file
out_file1
Line/Word/Character count
Text file
out_file1
Extract
Sequence fileConstraints on sequences 1 > List of constraint
information_filefasta_sequence_filefastq_sequence_filequality_filefasta_sequence_file_from_fastqreport_filepath
Input dataset
output
Input dataset
output
Workflowtosortsequencesgiventheirtype
ASaiMframeworkConfigurationofaGalaxyserverDevelopmentofwrappersfortoolintegrationDevelopmentofscriptstouseGalaxyandAPI
Usedtools
Code
Githubandsubmodules,Gitlab
Documentation
Sphinx+ReadTheDoc+Github
Webpage
Jekyll+Githubpage
Management
Trello,Slack
Learnedfromthisproject
Needtocorrectlydefinetheconception
Noworkflowmanagerwithinput/outputdependency
Donoreinventthewheel
Donotpreferhome-madesolution
Integrateactivecommunity
Needofgoodtoolsandgoodhabitsinbigprojects
ThankYou.Questions?
bebatut.fr
github.com/bebatut
twitter.com/bebatut