Lab 1: Introduction to Python Programming
1/20/17Slide credits:
Nicole Rockweiler!1
Afewpreliminarywords…
2
Overview
• Schedule• Logistics• GettingStarted• IntotoUnix• IntrotoPython• Assignment1
3
Gettingthemostoutofthiscourse
1. StartthehomeworkEARLY2. Collaborate3. Useyourresources– tutors,TAs,professors,labmates,discussion
groups,andmostofall,theinternet.4. Thinkbig
4
Logistics
• Registerfor4credits• Labsareacontinuationoftheconceptslearnedfromlectures• Labmaterialisgenerallynottestedonexams• Coursewebsite:http://genetics.wustl.edu/bio5488/• Bringyourlaptoptoeverylab
5
Wheretogethelp(a.k.a.howtomaintainyoursanity)
• Cometoofficehours• Mondaysafterclass(11:30am-12:30pm)inthe4th floorclassroom4515McKinley/areaoutsidetheclassroomandbyappointment
• Cometotutoringsessions• Tuesdays5:30-7pmin6001B*ScottMcKinleyBuilding• *4/4willbein5001B• FREEFOOD!!
• Usethegoogledocstoask/answerquestions-https://docs.google.com/spreadsheets/d/11KW_lu9mE59LBtF0X8EtrCJfHQZ22fQwz8AC3AMZSs8/edit?usp=sharing• [email protected]• Workingroups
6
7
Wheretogethelp(a.k.a.howtomaintainyoursanity)
Assignments
• AssignmentsarepostedonthecoursewebsiteWednesdaysat10am• AssignmentsareduethefollowingWednesdayat10am• Assignmentformat• Givenabioinformaticsproblem• Write/completeaPythonscript• Analyzedatawithyourscript• Answerbiologicalquestionsaboutyourresults
• Turninformat• MoreonthisinabitJ
8
Wed Thurs Fri Sat Sun Mon Tue WedHW
releasedClass
discussion&worktime
10-11:30am
Officehours11:30-
12:30pm
Tutoringsession5-7:30pm
HWdue10am
Schedule
9
Schedule(cont.)Assignment Released Due Topic
1 1/18 1/27 Introduction2 1/25 2/1 SequenceComparison3 2/1 2/8 NextGenSequencing4 2/8 2/15 GeneExpression5 2/15 2/22 Epigenomics6 2/22 3/1 MotifFinding7 3/1 3/22 Synthetic GeneAssembly8 3/1 3/22 Metagenomics9 3/22 3/29 GeneticVariation10 3/29 4/5 Wright-FisherModel11 4/5 4/12 TBD12 4/12 4/19 Substitution Rates
13 4/19 4/26 CisRegulatoryEvolution
2labsoverspringbreak
10
Assignmentpolicies
• SeetheCourseInformationà Assignmentpoliciesdocumentoncoursewebsite• Thereare13assignments
• Youmustturninallassignments• Allassignmentsareweightedequally
• Latepolicy• 25%penaltyforturninginassignment1daylate• Assignmentsthatare>1daylatewillgivena0• Emailus(early)torequestanextension
• Auditors• We’llgivecommentsonyourprograms,butwon’tgradetheshortanswerquestions• Samelatepolicyapplies
• Collaboration• Groupworkisencouraged,butplagiarismisunacceptable• Tryto“Googleit”first• Citeyoursources
• Workontheassignmentbeforecomingtolab 11
Grading
• Eachassignmentisoutof10points• Gradedon• Doesthecodework?
• Itdoesn’thavetobethe“fastest”or“mostefficient”togetfullcredit• Ifdoesn’twork,describewhereyouhadproblems• Isthecodewellcommentedandreadable?(moreoncommentinglaterJ)
• Aretheanswerscorrect?• Gradeswillbereturnedinafilecalledgrades.txtontheclassserver• OnlyyouandtheTAswillbeabletoreadthisfile
12
Gettingstarted
13
Remotecomputers
• Wewillbedoingallofourworkonaremotecomputerwiththe hostnamegenomic.wustl.edu• ThisisaUnix-basedcomputerthatwecansecurelyconnecttothroughaprotocolcalled secureshell (SSH).
14
Whatistheshell?
• Theshell isaprogramthattakescommandsfromthekeyboardandgivesthemtotheoperatingsystemtoexecute• Therearemanydifferentshellprograms• We’llbeusingthemostcommonshell:theBourne-AgainShell(bash)
15
AWindow’sGUI
HowdoIaccesstheshell?
• Mostofusarefamiliarwithgraphicaluserinterfaces(GUI)tocontrolourcomputers• Anotherwayiswithcommand-lineinterfaces (CLI)• Aterminal emulatorisaprogramthatallowsyoutointeractwiththeshellthroughaCLI• TherearemanydifferentterminalprogramsthatvaryacrossOSs
• We’llbeusingPuTTY (Windows)andTerminal(Mac)
APuTTY window
ATerminalwindow16
WhyshouldIlearnhowtouseshellsandterminals?
• CLIsarecommoninscientificcomputingà getusedtothem!• Theshellisareallypowerfulwayofinteractingwithyourcomputerà becomeasuperuser!
17
Bio5488commandconvention
• Wehighly recommendthatyoutypeallofthecommand/codeyourselfratherthancopyandpasting• Here'sanexampleofacommandline"snippet“
$ type_me_exactly <modify_me>output
$ ls <assignment>README.txt
Example:
Template:Don’ttypethe“<>”
Thisiscalledthecommandprompt.Itmeans,“I’mreadyforacommand!”Don’ttypethe“$.”
18
Howtologontotheremotecomputer(Windowsusers)
1. LaunchPutty2. Inthehostnamefield,enter
genomic.wustl.edu3. Enterasessionnickname,e.g.,
bio54884. ClickSave5. ClickOpen
19
Howtologontotheremotecomputer(Macusers)
1. OpenTerminal(foundin/Applications/Utilities)2.SSHtotheremotecomputer.Type:
ssh <username>@genomic.wustl.eduwhere<username> isreplacedwithyourusername
3.Asecuritymessagemaybeprinted.Typeyes andhitenter.
20
Howtologontotheremotecomputer(Macusers)
4.Enteryourpassword- itwillnotshowthatyouaretyping! Hitenter.
21
Acoupleofnotes
• WhenyoulogontotheclassserveryouwillbelocatedinYOURhomedirectory.• Everycommandthatyourunafterloggingontoaremotecomputerwillberunonthatcomputer.
22
SublimeText
• SublimeTextisatexteditor forwritingandeditingscripts• We’lluseSublimetoeditbothlocalandremotefiles• Documentation:http://www.sublimetext.com/support
23
Cyberduck
• Cyberduck isasecurefiletransferclient andwillallowyoutotransferfilesfromyourlocalcomputertoaremotecomputer
24
Exercise:settingupCyberduck
• Createabookmark• LaunchtheCyberduck application• ClickBookmarkà NewBookmark• SelectSFTP(SSHFileTransferProtocol)fromthedropdownmenu• Enteranicknameforthebookmark,e.g.,bio5488• Entergenomic.wustl.eduastheservername• ClicktheX
• Setthedefaulttexteditor• ClickCyberduck/Edità Preferencesà Editor• Selectsublimetextfromthedropdownmenu.(Youmayneedbrowseyourcomputerfortheeditor)
• CheckAlwaysusethisapplication• RestartCyberduck
25
Exercise:transferringfileswithCyberduck
• Todownload afiletoyourlocalcomputer• DraganddropafilefromCyberduck toyourFinder/FileExplorerwindow• Or,double-click
• Toupload afiletotheremotecomputer• DraganddropafilefromFinder/FileExplorertoCyberduck
26
Exercise:editingremotefileswithSublimeTextandCyberduck
• Newfiles• ClickFileà Newfile• Enterafilename• Clickedit• SublimeTextshouldnowlaunch• Addsometexttothefile• ClickFileà Saveorctrl+s
• Existingfiles• Selectthefilebyclickingthefilename1X• ClicktheEditbuttoninthenavigationbar• Editthefile• ClickFileà Saveorctrl+s
27
BasicUnix
28
Thefilesystem
• The filesystem isthepartoftheoperatingsystem(OS)responsibleformanagingfilesandfolders• InUnix,foldersarecalled directories.
• Unixkeepsfilesarrangedinahierarchicalstructure• Thetopmostdirectoryiscalledtherootdirectory• Eachdirectorycancontain
• Files• Subdirectories
• Youwillalwaysbe“in”adirectory• Whenyouopenaterminalyouwillbeinyourown homedirectory.
• Onlyyoucanmodifythingsinyourhomedirectory
29
aclemens
Determiningwhereyouare(pwd)
• Ifyougetlostinthefilesystem,youcandeterminewhereyouarebytyping:
$ pwd/home/aclemens
• pwd standsforprintworkingdirectory• pwd printsthefullpath ofthecurrentworkingdirectory
30
Listingdirectorycontents(ls)
• Tolistthecontentsofadirectory:$ lsassignment1 foo
• lsstandsforlistdirectorycontents
31
Changingdirectories(cd)
• Tochangetodifferentdirectory$ cd <directory_name>
where<directory_name> =thepath youwanttomoveto
• Apathisalocationinthefilesystem• cdstandsforchangedirectory• Togetbacktoyourhomedirectory
$ cd ~• ~ isshorthandforyourhomedirectory
32
Changingdirectories(cont.)
• Tomoveone directoryabovethecurrentdirectory$ cd ../
• Tomovetwo directoriesabovethecurrentdirectory$ cd ../../
• Youcanstringasmany../asyouneedto
33
Makingdirectories(mkdir)
• Tomakeadirectory$ mkdir <new_directory_name>where
<new_directory_name> =nameofthedirectorytocreate• mkdir standsformakedirectory• Donotusespacesor“/”indirectoryorfilenames
34
Exercise:createsomedirectories
Trytocreatethisdirectorystructure:
Hints• Usepwd todeterminewhereyouareinthedirectorystructure• Usecd tonavigatethroughthedirectorystructure.• Usemkdir tocreatenewdirectories
35
Copyingthings(cp)
• Tocreateacopyofafile$ cp –i <filename> <copy_of_filename>where
<filename> =fileyouwanttocopy<copy_of_filename> =nameofcopiedfileThe-i flag isasafetyfeaturetomakesureyoudonotoverwriteafilethatalreadyexists(interactive)
• Tocreateacopyofadirectory$ cp -r <directory> <copy_of_directory>where
<directory> =directoryyouwanttocopy<copy_of_directory> =nameofcopieddirectoryThe-rflagisrequiredtocopyallofthedirectory’sfilesandsubdirectories 36
Copyingthings(cont.)(cp)
• cp standsforcopyfiles/directories• Tocreateacopyoffileandkeepthenamethesame
$ cp –i <filename> .where
<filename> =fileyouwanttocopy• Theshortcutisthesamefordirectories,justremembertoincludethe-rflag
37
Exercise:copyingthings
Copy/home/assignments/assignment1/README.txt toyourworkdirectory.Keepthenamethesame.
38
Renaming/movingthings(mv)
• Torename/moveafile/directory$ mv -i <original_filename> <new_filename>where
<original_filename> =nameoffile/dir youwanttorename<new_filename> =nameyouwanttorenameitto
• mvstandsformovefiles/directories
39
Printingcontentsoffiles(cat)
• Toprintafile$ cat <filename>where
<filename> =nameoffileyouwanttoprint• catstandsforconcatenatefileandprinttothescreen• Otherusefulcommandsforprintingpartsoffiles:• more• less• head• tail
40
Exercise:printingcontentsoffiles
PrintthecontentsofyourREADME.txt
Experimentwithusingdifferentcommands,e.g.,cat,head,andtail.Howdothecommandsdiffer?
41
DeletingThings(rm)
• Todeleteafile$ rm <file_to_delete>where
<file_to_delete> =nameofthefileyouwanttodelete
• Todeleteadirectory$ rm –r -i <directory_to_delete>where
<directory_to_delete> =nameofthedirectoryyouwanttodelete
• rm standsforremovefiles/directories
IMPORTANT:thereisnorecyclebin/trashfolderonUnix!!Onceyoudeletesomething,itisgoneforever.
Beverycarefulwhenyouuse rm!! 42
TIP:Checkthatyou’regoingtodeletethecorrectfilesbyfirsttestingwith'ls'andthencommittingto'rm'
Exercise:deletingthings
Deletethetest directorythatyoucreatedinapreviousexercise.
43
Savingoutputtofiles
• Save theoutputtoafile$ <cmd> > <output_file>where
<cmd> =command<output_file> =nameofoutputfile
• WARNING:thiswilloverwritetheoutputfileifitalreadyexists!• Append theoutputtotheendofafile
$ <cmd> >> <output_file>
Thereare2“>”
44
Learningmoreaboutacommand(man)
• Toviewacommand’sdocumentation$ man <cmd>where
<cmd> =command• manstandsformanualpage• Usetheandarrowkeystoscrollthroughthemanualpage
• Type“q”toexitthemanualpage
↑ ↑
45
Exercise:readingdocumentation
Determinewhatthefollowingcommanddoes$ cal
46
Gettingyourselfoutoftrouble
• Abortacommand
• Temporarilystopacommand
• Resumeastoppedjob$ fg <job_id>
47
Unixcommandscheatsheet--yournewbestie
https://ubuntudanmark.dk/filer/fwunixref.pdf48
Assignment1
49
Howtocomplete&“turnin”assignments
1. Createaseparatedirectoryforeachassignment2. Create“submission”and“work”subdirectories• Work=scratchwork• Submission=finalversion• TheTAswillonlygradecontentthatisinyoursubmissiondirectory
3. CopythestarterscriptsandREADMEtoyourworkdirectory
4. Copythefinalversionofthefilestoyoursubmissiondirectory• Don’ttouchthesubmissionfolderagain!Timestampsofthefilesareusedtodetermineiftheassignmentwasturnedinontime 50
READMEfiles• AREADME.txt filecontainsinformationonhowtorunyourcodeandanswerstoanyofthequestionsintheassignment
• Atemplatewillbeprovidedforeachassignment• Copythetemplatetoyourworkfolder• Replacethetextin{}withyouranswers• LeaveallotherlinesaloneJ
51
Question 1:{nuc_count.py nucleotide count output}-Comments:{Things that went wrong or you can not figure out}-
Question 1:A: 10C: 15G: 20T: 12-Comments:The wording for part 2 was confusing.-
AREADME.txttemplate AfilledoutREADME.xt
Usagestatements inREADME.txt
• Purpose• Tellsauser(you,TA,anyoneunfamiliarwithyour)howtorunthescript• Documentshowyoucreatedyourresults
• Goodpractices• Writeoutexactlyhowyouranthescript:python3 foo.py 10 bar
• AND/OR,writeouthowtorunthescriptingeneral,i.e.,withplaceholdersforcommand-lineargumentspython3 foo.py <#_of_genes> <gene_of_interest>
• TIP:copyandpasteyourcommandsintoyourREADME• TIP:usethecommandhistory toviewpreviouscommands(uparrow)
52
53
Assignment1TODOs
• Downloadchr20viaFTP(hereweusewget)• Youwillbegivenastarterscript(nuc_count.py)thatcountsthetotalnumberofA,C,G,Tnucleotides• Modifythescripttocalculatethenucleotidefrequencies• Modifythescripttocalculatethedinucleotidefrequencies
• Modifyastarterscript(make_seq.py)togeneratearandomsequencegivennucleotidefrequencies• Usemake_seq.py togeneraterandomsequencewiththesamenucleotidefrequenciesaschr20• Comparethechr20di/nucleotidefrequencies(observed)withtherandommodel(expected)
54
Fasta fileformat
• Astandardtext-basedfileformatusedtodefinesequences,e.g.,nucleotideorpeptidesequences• .faor.fasta extension• Eachsequenceisdefinedbymultiplelines• Line1:Descriptionofsequence.Startswith“>”• Lines2-N:Sequence
• Afasta cancontain≥1sequence
>chr22ACGGTACGTACCGTAGATNAGTAN>chr23ACCGATGTGTGTAGGTACGTNACGTAGTGATGTAT
Examplefasta file
1
2
3
4
5
55
Requirements
• DuenextFriday (1/27)at10am• Yoursubmissionfoldershouldcontain:
□APythonscripttocountnucleotides(nuc_count.py)□APythonscripttomakearandomsequencefile(make_seq.py)
□Anoutputfilewitharandomsequence(random_seq_1M.txt)
□AREADME.txt filewithinstructionsonhowtorunyourprogramsandanswerstothequestions.
• Remembertocommentyourscript!
56
Pythonbasics
RecyclingNicole’sslidesfromyear2016*
57
WhatisPython?• Pythonisawidelyusedprogramminglanguage• Firstimplementedin1989byGuidovanRossum• Free,open-sourcesoftwarewithcommunity-baseddevelopment• Trivia:PythonisnamedaftertheBBCshow“MontyPython’sFlyingCircus”andhasnothingtodowithreptiles VanRossumisknownas
a"BenevolentDictatorForLife"(BDFL)
WhichPython?• Thereare2widelyusedversionsofPython:Python2.7andPython3.x• We’llusePython3• ManyhelpforumsstillrefertoPython2,somakesureyou’reawarewhichversionisbeingreferenced
58
InteractingwithPythonThereare2mainwaysofinteractingwithPython:
ThisisPython’scommandprompt.Itmeans,“I’mreadyforacommand!”Don’ttypethe“>>>” 59
Variables• Themostbasiccomponentofanyprogramminglanguageare"things,"alsocalled variables• Avariablehasanameandanassociatedvalue• ThemostcommontypesofvariablesinPythonare:
Type Description Example
Integers Awholenumber x=10
Floats Arealnumber x=5.6
Strings Text(1ormorecharacters) x=“Genomics”
Booleans Abinaryoutcome:trueorfalse x=True
60
Youcanusesinglequotesordoublequotes
• Tosaveavariable,use=>>> x = 2
• Todeterminewhattypeofvariable,usethetype function>>> type(x)<class 'int'>
• IMPORTANT: thevariablenamemustbeonthelefthandside ofthe=>>> x = 2
>>> 2 = x
Variables(cont.)
Thevalue ofthevariableThename ofthevariable
61
Variablenaming(best)practices
• Muststartwithaletter• Cancontainletters,numbers,andunderscoresß nospaces!• Pythoniscase-sensitive:x ≠ X• Variablenamesshouldbedescriptiveandhavereasonablelength• UseALLCAPSforconstants,e.g.,PI• Donotusenamesalreadyreservedforotherpurposes(min,max,int)
62Wanttolearnmoretips?Checkouthttp://www.makinggoodsoftware.com/2009/05/04/71-tips-for-naming-variables/
Exercise:definingvariables
• Createthefollowingvariablesfor• Yourfavoritegenename• Theexpressionlevelofagene• Thenumberofupregulatedgenes• WhethertheHOXA1 genewasdifferentiallyexpressed
• Whatisthetypeforeachvariable?
63
Cheatsheet
Collectionsofthings
• Whyisthisconceptuseful?• Weoftenhavecollectionsofthings,e.g.,
• Alistofgenesinapathway• Alistofgenefusionsinacancercellline• AlistofprobeIDsonamicroarrayandtheirintensityvalue
• Wecould storeeachiteminacollectioninaseparatevariable,e.g.,gene1 = ‘SUCLA2’gene2 = ‘SDHD’...
• Abetterstrategyistoputalloftheitemsinonecontainer• Pythonhasseveraltypesofcontainers
• List (similartoarrays)• Set• Dictionary
64
Lists:whatarethey?
• Listsholdacollectionofthingsinaspecifiedorder• Thethingsdonothavetobethesametype
• Manymethodscanbeusedtomanipulatelists.
65
Syntax Example Output
Createalist<list_name> = [<item1>, <item2>]
Indexalist<listname>[<position>] 'SDHD'
Lists:wherecanIlearnmore?
• Python.orgtutorial:https://docs.python.org/3.4/tutorial/datastructures.html#more-on-lists• Python.orgdocumentation:https://docs.python.org/3.4/library/stdtypes.html#list
66
Doingstufftovariables
• Thereare3commontoolsformanipulatingvariables• Operators• Functions• Methods
67
Operators
• Operatorsareaspecialtypeoffunction:• Operatorsaresymbolsthatperformsomemathematicalorlogicaloperation
• Basicmathematicaloperators:
68
Operator Description Example+ Addition >>> 2 + 3
5- Subtraction >>> 2 - 3
-1* Multiplication >>> 2 * 3
6/ Division >>> 2 / 3
0.6666666666666666
Operators(cont.)Youcanalsouseoperatorsonstrings!
69
Operator Description Example+ Combinestringstogether >>> 'Bio' + '5488'
'Bio5488'>>> 'Bio' + 5488Traceback (most recent call last):
File "<stdin>", line 1, in <module>TypeError: Can't convert 'int' object to strimplicitly
* Repeatastringmultipletimes >>> 'Marsha' * 3'MarshaMarshaMarsha'
Isitabird?Isitaplane?Noit’sastring!
Stringsandintscannotbecombined
Relationaloperators
• Relationaloperatorscompare2things• Returnaboolean
70
Operator Description Example< Less than >>> 2 < 3
True<= Lessthanorequalto >>> 2 <= 3
True> Greaterthan >>> 2 > 3
False>= Greaterthanorequalto >>> 2 >= 3
False== Equalto >>> 2 == 3
False!= Notequalto >>> 2 != 3
True
==isusedtotestforequality
=isusedtoassignavaluetoavariable
Logicaloperators
• Performalogicalfunctionon2things• Returnaboolean
71
Operator Description Exampleand ReturnTrue ifboth argumentsaretrue >>> True and True
True>>> True and FalseFalse
or ReturnTrue ifeither argumentsaretrue >>> True or FalseTrue>>> False or FalseFalse
Functions:whatarethey?• Whyarefunctionsuseful?
• Allowyoutoreusethesamecode• Programmersarelazy!
• Ablockofreusable codeusedtoperformaspecifictask
72
Takeinarguments(optional)
Dosomething
Returnsomething(optional)
• Similartomathematicalfunctions,e.g.,𝑓 𝑥 = 𝑥$• 2types:
Built-inFunctionprewrittenforyou
print:printsomethingtotheterminalfloat:convertsomethingtoafloatingpoint#
User-definedYoucreateyourownfunctions
Functions:howcanIcallafunction?
73
Syntax Example Output
Callafunctionthattakesnoarguments<function_name>()
Callafunctionthattakesargument(s)<function_name>(<arg1>, <arg2>) 8
Pythonfunctions:wherecanIlearnmore?
• Python.orgtutorial• User-definedfunctions:https://docs.python.org/3/tutorial/controlflow.html#defining-functions
• Python.orgdocumentation• Built-infunctions:https://docs.python.org/3/library/functions.html
74
Methods:whatarethey?
• Firstapreamble...• Methodsareaclosecousinoffunctions• Forthisclasswe’lltreatthemasbasicallythesame• Thesyntaxforcallingamethodisdifferentthanforafunction• Ifyouwanttolearnaboutthedifferences,googleobjectorientedprogramming (OOP)
• Whyarefunctions methodsuseful?• Allowyoutoreusethesamecode
75
StringmethodsSyntax Description Example
<str>.upper() • Returnsthestringwith alllettersuppercased >>> x = "Genomics">>> x.upper()'GENOMICS'
<str>.lower() • Returnsthestringwith allletterslowercased >>> x.lower()'genomics'
<str>.find(<pattern>) • Returnsthefirstindexof<pattern>inthestring• Returns -1iftheif<pattern> isnotfound
>>> x.find('nom')2
<str>.count(<pattern>) • Returnsthenumberoftimes<pattern>isfoundinthestring
• HINT:explorehow.countdealswithoverlappingpatterns
>>> x.count('g')0
<str>[<index>] • Returnstheletteratthe<index>th position >>> x[1]'e'
https://docs.python.org/3.4/library/stdtypes.html?#string-methodshttps://docs.python.org/3/library/stdtypes.html#str
0 1 2 3 4 5 6 7
G e n o m i c s 76
Makingchoices(conditionalstatements)
• Whyisthisconceptuseful?• Oftenwewanttocheckifaconditionistrueandtakeoneactionifitis,andanotheractioniftheconditionisfalse• E.g.,Ifthealternativeallelereadcoverageataparticularlocationishighenough,annotatethepositionasaSNPotherwise,annotatethepositionasreference
77
Conditionalstatementsyntax
78
Syntax Example OutputIfif <condition>:
# Do somethingx is positive
If/elseif <condition>:
# Do somethingelse:
# Do something else
x is NOT positive
If/else if/elseif <condition1>:
# Do somethingelif <condition2>:
# Do something elseelse:
# Do something else
x is negative
Indentationmatters!!!Indentthelinesofcodethatbelongtothesame
codeblockUse1tab
Commenting yourcode
• Whyisthisconceptuseful?• Makesiteasierfor--you,yourfutureself,TAsJ,anyoneunfamiliarwithyourcode--tounderstandwhatyourscriptisdoing
• Commentsarehumanreadabletext.TheyareignoredbyPython.• Addcommentsfor
Thehow• Whatthescriptdoes• Howtorunthescript• Whatafunctiondoes• Whatablockofcodedoes
79
TREATYOURCODELIKEALABNOTEBOOK
Thewhy• Biologicalrelevance• Rationalefordesignandmethods• Alternatives
Commentingruleofthumb
Alwayscode[andcomment]asiftheguywhoendsupmaintainingyourcodewillbeaviolentpsychopathwhoknowswhereyoulive.Codeforreadability.
-- JohnWoods
80
Commentingyourcode(cont.)
• Commentingisextremelyimportant!
81
• Pointswillbedeductedifyoudonotcommentyourcode
• Ifyouusecodefromaresource,e.g.,awebsite,citeit
Commentsyntax
82
Syntax Example
Blockcomment# <your_comment># <your_comment>
In-linecomment<code> # <your_comment>
Pythonmodules
• AmoduleisfilecontainingPythondefinitionsandstatementsforaparticularpurpose,e.g.,• Generatingrandomnumbers• Plotting
• Modulesmustbeimportedatthebeginningofthescript• Thisloadsthevariablesandfunctionsfromthemoduleintoyourscript,e.g.,
import sysimport random
• Toaccessamodule’sfeatures,type<module>.<feature>,e.g.,sys.exit()
83
Randommodule
• Containsfunctionsforgeneratingrandomnumbersforvariousdistributions• TIP:willbeusefulforassignment1
84
Function Description
random.choice Returnarandomelementfromalist
random.randint Returnarandominterger inagivenrangerandom.random Return arandomfloatintherange[0,1)Random.seed Initializethe (pseudo)randomnumbergenerator
https://docs.python.org/3.4/library/random.html
Howtorepeatyourself(forloops)
• Whyisthisuseful?• Often,youwanttodothesamethingoverandoveragain• Calculatethelengthofeachchromosomeina
genome• Lookupthegeneexpressionvalueforeverygene• AligneachRNA-seq readtothegenome
• Aforlooptakesoutthemonotonyofdoingsomethingabazilliontimesbyexecutingablockofcodeoverandoverforyou• Remember,programmersarelazy!
• Aforloopiterates overacollectionofthings• Elementsinalist• Arangeofintegers• Keysinadictionary
85
ForloopsyntaxSyntax Example Outputfor <counter> in <collection_of_things>:
# Do somethingHello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!Hello!
0123456789
• The<counter> variableisthevalueofthecurrentiteminthecollectionofthings• Youcanignoreit• Youcanuseitsvalueintheloop
• Allcodeintheforloop’scodeblockisexecutedateachiteration
• TIP:Ifyoufindyourselfrepeatingsomethingoverandover,youcanprobablyconvertyourcodetoaforloop!
86
Indentationmatters!!!Indentthelinesofcodethatbelongtothesame
codeblockUse1tab
Whichoptionwouldyouratherdo?
87
A
B
Howtorepeatyourself(cont.)
• Forloopshaveaclosecousincalledwhileloops• Themajordifferencebetweenthe2• Forloopsrepeatablockofcodeapredeterminednumberoftimes(really,acollectionofthings)• Whileloopsrepeatablockofcodeaslongasanexpressionistrue
• e.g.,whileit’ssnowing,repeatthisblockofcode• Whileloopscanturnintoinfinitewhileloopsà theexpressionisneverfalsesotheloopneverexits.Becareful!
• Seehttp://learnpythonthehardway.org/book/ex33.html foratutorialonwhileloops
88
Command-linearguments
• Whyaretheyuseful?• Passingcommand-lineargumentstoaPythonscriptallowsascripttobecustomized
• Example• make_nuc.py cancreatearandomsequenceofanylength• Ifthelengthwasn’tacommand-lineargument,thelengthwouldbehard-coded• Tomakea10bpsequence,wewouldhaveto1)editthescript,2)savethescript,and3)runthescript.
• Tomakea100bpsequence,we’dhaveto 1)editthescript,2)savethescript,and3)runthescript.
• Thisistedious&error-prone• Remember:bealazyprogrammer!
89
90
Command-linearguments
• Pythonstoresthecommand-lineargumentsasalistcalledsys.argv• sys.argv[0] # script name• sys.argv[1] # 1st command-line argument• …
• IMPORTANT:argumentsarepassedasstrings!• Iftheargumentisnotastring,convertit,e.g.,int(),float()
• sys.argv isalistofvariables• Thevaluesofthevariables,e.g.,theAfrequency,arenot“pluggedin”untilthescriptisrun
• UsetheA_freq tostandfortheAfrequencythatwaspassedasacommand-lineargument
91
Reading(andwriting)tofilesinPython
Whyisthisconceptuseful?• Oftenyourdataismuchlargerthanjustafewnumbers:• Billionsofbasepairs• Millionsofsequencingreads• Thousandsofgenes
• It’smaynotfeasibletowriteallofthisdatainyourPythonscript• Memory• Maintenance
Howdowesolvethisproblem?
92
Output file 2
Reading(andwriting)tofilesinPython
Thesolution:• Storethedatainaseparatefile• Then,inyourPythonscript• Read inthedata(linebyline)• Analyzethedata• Write theresultstoanewoutputfileorprintthemtotheterminal
• Whentheresultsarewrittentoafile,otherscriptscanreadintheresultsfiletodomoreanalysis
93
Python script 1
Input file
Output file 1
Python script 2
ReadingafilesyntaxSyntax Example
with open(<file>) as <file_handle>:for <current_line> in open(<file>) , ‘r’):
<current_line> = <current_line>.rstrip()# Do something
Output>chr1ACGTTGATACGTA
94
Theanatomyofa(simple)script
95
• Thefirstlineshouldalwaysbe#!/usr/bin/env python3
• Thisspeciallineiscalledashebang• Theshebangtellsthecomputer
howtorunthescript• ItisNOTacomment
Theanatomyofa(simple)script
96
• Thisisaspecialtypeofcommentcalledadocstring,ordocumentationstring
• Docstringsareusedtoexplain1)whatscriptdoesand2)howtorunit
• ALWAYSincludeadocstring• Docstringsareenclosedintriple
quotes,“““
Theanatomyofa(simple)script
97
• Thisisacomment• Commentshelpthereaderbetter
understandthecode• Alwayscommentyourcode!
Theanatomyofa(simple)script
98
• Thisisanimportstatement• Animportstatementloads
variablesandfunctionsfromanexternalPythonmodule
• Thesysmodulecontainssystem-specificparametersandfunctions
Theanatomyofa(simple)script
99
• Thisgrabsthecommandlineargumentusingsys.argv andstoresitinavariablecalledname
Theanatomyofa(simple)script
100
• Thisprintsastatementtotheterminalusingtheprintfunction
• Thefirstlistofargumentsaretheitemstoprint
• Theargumentsep=“”saysdonotprintadelimiter(i.e.,aseparator)betweentheitems
• Thedefaultseparatorisaspace.
101