TheBioinformaticsTrainingLandscapeTorontoAreaBioinformaticsUserGroup(TorBug)
November29,2017
JasonWilliamsColdSpringHarborLaboratory,DNALearningCenter
@JasonWilliamsNY
What is the Bioinformatics Training Landscape?
What is bioinformatics? How do you define bioinformatics vs. comp bio?
Is bioinformatics biology?
Knowledge
FAIRprinciples
Dataintegration
Algorithmsandmethods
Metadataandstandards
Datastorage/acquisition/management/sharing
Bioinformatics Buzzword PyramidThere’s a road ahead, but where are we going?
Bioinformatics Buzzword PyramidThere’s a road ahead, but where are we going?
Talk outline
• What do we know about the needs of researchers and educators?
• Roadblocks and opportunities
• What do you think?
What is the Bioinformatics Training Landscape?
• Startedin2008asiPlantCollaborative
• $100Million,10YearNSFinvestment
• 40K+users,~3PBuserdata;national
andinternationalfederation
• Make“BigDataBiology”accessible to
theaveragebiologist
330+workshops,conferences,andseminarssince2009
Beginner(ordon'tuse
bioinformatics)53%
Intermediate35%
Advanced12%
“Howdoyourateyourlevelofbioinformaticsskills?”
93%ofsurveyrespondentsindicatetheyareorwill
beworkingwithlargedatasets
SurveyofNSFPIs– UnmetneedsforBioBigData
• 2016,Collected704/3987respondentsacrossthe4NSFBio
• Askedaboutcurrentandfutureneedsintrainingandinfrastructure
• Stratifiedbyresearchgroupsize/directorate
“Withwhattype(s)ofdatadoyoucurrentlyworkinyourresearch?”
“Isthiscurrentlyimportantinyour
research?
Doyouthinkthiswillbeimportanttoyour
researchin3years?”
“Doesyourinstitutionmeetthisneed?”(‘no’responses)
Training is the biggest need!
• NSFResearchCoordinationNetwork;UndergraduateBiologyEducation
• ~30Undergraduateeducators,expertsinassessment,industry,andotherfields
• https://qubeshub.org/groups/niblse
• Interestedin:
o FacultyandStudentPreparation
o Methodsandresourcesforintegration
o Assessment
Largest-eversurveyonbioinformaticsinundergraduateeducation(U.S.)
• ~1260responses
• Somegoals:
• Understandfacultyperceptionsandbehaviorsinaddressing
bioinformatics(Isitimportant?Areyouincluding?Ifnotwhy?)
• Gatherinformationonbioinformaticsrelatedsyllabi
(Whatdoyouteach,when,how?)
Compositesurveyrespondentisawhitemale/femalePhD,self-taughtinbioinformatics,withaPhDdegreeearnedin2000–2009.S/heworksatanon–minority-serving,doctoral-grantinginstitutionwithanundergraduateenrollmentoflessthan5,000.
Question NumberofKeyword-codedResponses
1.Inyouropinion,whatdoyouthinkarethemostimportant challengescurrentlyfacingthoseeducatingundergraduatelifescientistsinbioinformatics? 734(59.6%)
2A. Whatisthelevelofthecoursesyouteachinwhichyouwouldliketoincludebioinformaticscontent?(multiplechoice)
2B.Pleasedescribebriefly;includeanybarrierstodevelopmentand/orimplementation
364(29.6%)
3*.Whatispreventingyoufromincludingbioinformaticscontentinthesecourses? 313(25.4%)
4.Atyourcurrentinstitution,doyoufaceanytechnicalbarriersinteachingbioinformatics,e.g.,availabilityofacomputerlab,differentoperatingsystems,accesstohighperformancecomputingforteaching,ITsupport?Pleasedescribe
511(41.5%)
*- onlyaskedtothosewhoindicatedtheyarenotintegratingbioinformatics
95%ofsurveyrespondentsindicatethatbioinformaticsshouldbeintegratedintothelifesciencecurriculum.
40% offacultyreportachievingthis.
Trainingisthebiggestneed!
41%ofsurveyrespondentswhoare
underrepresentedminoritiesinSTEMreporttrainingbarriersvs.28% ofnon-URMfaculty.
Trainingisthebiggestneed!
“Lackoftraininginthearea,asmanyofusearnedourPhDsbeforebioinformaticswaswidelyusedandavailable.”
“Lackoffacultyexpertiseintherapidlychangingdisciplinesofgenomics/bioinformaticsandrelatedtechnologies.”
“Alackoftrainingandexperienceinbioinformatics.”
Barriersdifferbyinstitutiontype
MCA: Institution type
23%ofsurveyrespondentsatminority-serving
institutionsreportintegratingbioinformaticsvs.43% annon-minority-servinginstitutions.
MCA:Integrationstatus
Studentstakingdedicatedcourseslessprepared?
DecadeofHighest Degree
Earned
FormalBioinformaticsTraining(%)
FacultyIntegratingBioinformatics(%)
1980-1989 8.4 35.4
1990-1999 11.3 41.9
2000-2009 35.1 41.7
2010-2016 48.3 25.2
New,bettertrainedfacultyaren’tintegratingbioinformatics!
MCA:DecadeofDegree
Accesstoresourcesdiffersbysex?
Talk outline
• Whatdoweknowabouttheneedsofresearchersandeducators?
• Researchersandeducatorsstillexperienceatraininggap
• Trainingmorepressinganeedthantechnology/infrastructure
• Researchtrainingneedsaresophisticated(evolving?)
• Facultytrainingdoes!=touseintheclassroom
• Disparitiesintrainingexistandneedtobebetterunderstood
Talk outline
• What do we know about the needs of researchers and educators?
• Roadblocks and opportunities
• What do you think?
• Longitudinalstudyof294PhDstudentsfrom53USinstitutions
• Nostatisticallysignificantoutcomesover115variables(e.g.Publication/abstracts,
writingsamples,scholarlyengagement,etc.)
• Consistentwithpublishedpedagogicalresearchthatbootcamps cannotleadto
sustainablelearning*
• Meta-analysis of published assessments in genomics and bioinformatics
education
• Most studies assessed summative learning gains
• Less than 10% of studies provided reliability/validity evidence for assessments
(only one study provided both)
But doesn’t training work?
But doesn’t training work?
Towards evidence-based training and assessment
Somekeyinsightsforinformaltraining
SeeDr.Tractenberg’s slides:https://www.academia.edu/34607065/Learning_goals_teaching_goals_and_assessment_in_bioinformatics_training
• Rememberthedifferencebetweenformal/informaltraining– muchof
theliteratureassumesaeducation trajectory(longterm)vs.training
whereskillacquisitionisprimary
• Goodtrainingmusthaveclearlyarticulatedlearningoutcomes
• Soundsobvious,butlearnerscanonlyreachthelevelofskilluseyouare
actuallyabletoconveyinyourcourse(Bloom’sTaxonomy)
• Learnersneedtobeguidedonhow/why/when skillsshouldbeused
• DataCarpentry– Basicdataorganization/management(R/Linux)anduseofcommand-linebioinformaticstools(Unix)
• SoftwareCarpentry – Basiccomputingskillsincludingautomation(Linux/Make),scripting(PythonorR),versioncontrol(Git/GitHub)
LessonDevelopment2015
Teachingpilotsandrefinements2016
GlobalCommunity2017
GenomicsDataCarpentry
InteractingwithComputers• CouldComputing• Connectingtoremotecomputing(SSH/PuTTY)• FileTransfer(FileZilla,othercommand-linetools:
scp,rsynch,wget,etc.)
DataManagementandOrganization• Opensource• Metadataandreproducibility• Importantgenomicsfileformats(CSV/TSV,FastQ,
SAM/BAM,VCF,etc.)• Organizingafilesystemforcomputationalprojects(Linux)• UnixShell(command-line:ls,cd,mkdir,cp,rm,wc,grep,
cut,columns,head,tail,lessetc,)• R:Creatingprojects,scripts,andexaminingdata
DataCleaningandvisualizationR:variouspackagesandfunctionsR:dplyrR:ggplotFastQC - qualitycontrolofhigh-throughputsequencedataTrimmomatic - filteringandtrimmingofhigh-throughputsequencedataIntegratedGenomeViewer
AutomationandscriptingRscripting'For'loopsBuildingautomatedpipelinesUsingmultithreadedapplications
LearningGoalsGenomicsDataCarpentry
Carpentry Assessment
Thelong-termsurveyassessedconfidence,motivation,andotheroutcomesmorethansixmonths afterrespondentsattendedaCarpentryworkshop.
• 77%ofrespondentsreportedbeingmoreconfidentinthetools thatwerecovered
duringtheirworkshopcomparedtobeforetheworkshop.
• 54%ofrespondentshavemadetheiranalysesmorereproducible asaresultof
completingaworkshop.
• 65%ofrespondentshavegainedconfidenceinworkingwithdata asaresultof
completingaworkshop.
• 74%ofrespondentshaverecommendedourworkshops toafriendorcolleague.
Talk outline
• Roadblocksandopportunities
• Thereisalotofroomforimprovementinhowweassesstheimpactoftraining/(includingcoordinationandstandardizationofdisparateefforts)
• Sincethemajorityofbioinformaticseducationis(willremain?)informal,trainingneedstointegrateinsightsfromcognitivesciences
• Carpentriesapproachissucceedinginmanywaysweneedtopayattentionto,improve,andcriticize
Thanks!
Sorry!
Talk outline
• Whatdoyouthink?
• Howareyouinvolvedintraining?• Whatresponsibilitiesdobioinformaticianshavetocontribute?
• Howcanthenon-bioinformatician keepupwiththepaceofchange?