Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | darcy-ward |
View: | 212 times |
Download: | 0 times |
11
Introduction to Introduction to BioinformaticsBioinformatics
Fall 2008
22
AdministrationAdministration
Adi DoronAdi Doron [email protected] [email protected] Nimrod RubinsteinNimrod Rubinstein [email protected] [email protected] Dudu BursteinDudu Burstein [email protected] [email protected] Reception hours:Reception hours:
by appointmentby appointmentBritania 405, 6409245Britania 405, 6409245
33
Course WebsiteCourse Website
http://bioinfo.tau.ac.il/~intro_bioinfo/http://bioinfo.tau.ac.il/~intro_bioinfo/
44
ExercisesExercises
Each student participates once in 2 weeks:Each student participates once in 2 weeks:Sunday 16:00-18:00Sunday 16:00-18:00Monday 12:00-14:00Monday 12:00-14:00
Monday 14:00-16:00 Monday 14:00-16:00 Computer classroom Sherman 03Computer classroom Sherman 03
55
RequirementsRequirements
Exam – 80% of final gradeExam – 80% of final grade Assignments – 20% of final grade Assignments – 20% of final grade
(Compulsory)(Compulsory) Assignments include class and home works:Assignments include class and home works:
• Class works are planned to be completed during the Class works are planned to be completed during the exercise. They should be mailed to the TA. They will exercise. They should be mailed to the TA. They will be checked but not graded.be checked but not graded.
• Home works should be handed in the following Home works should be handed in the following exercise (2 weeks after the hand out date). They will exercise (2 weeks after the hand out date). They will be checked and graded.be checked and graded.
66
GoalsGoals
To familiarize the students with research topics To familiarize the students with research topics in bioinformatics, and with bioinformatic toolsin bioinformatics, and with bioinformatic tools
The emphasis will be on tools and their useThe emphasis will be on tools and their use
PrerequisitesPrerequisites
Familiarity with topics in molecular biology Familiarity with topics in molecular biology (cell biology and genetics)(cell biology and genetics)
Basic familiarity with computers & internetBasic familiarity with computers & internet
77
BIOINFORMATIC DATABASESBIOINFORMATIC DATABASES
88
What’s in a databaseWhat’s in a database?? Sequences – genes, proteins, etc.Sequences – genes, proteins, etc.
Full genomesFull genomes
Annotation – information about the gene/protein:Annotation – information about the gene/protein:- function- function- cellular location- cellular location- chromosomal location- chromosomal location- introns/exons- introns/exons- protein structure- protein structure- phenotypes, diseases- phenotypes, diseases
PublicationsPublications
99
NCBI and EntrezNCBI and Entrez
One of the largest and most comprehensive One of the largest and most comprehensive databases belonging to the NIH – national databases belonging to the NIH – national institute of health (USA)institute of health (USA)
Entrez is the search engine of NCBIEntrez is the search engine of NCBI Search for :Search for :
genes, proteins, genomes, structures, diseases, genes, proteins, genomes, structures, diseases, publications and morepublications and more..
httphttp://://wwwwww..ncbincbi..nlmnlm..nihnih..govgov//
1010
Search for published papersSearch for published papers Yang X, Kurteva S, Ren X, Lee S,Yang X, Kurteva S, Ren X, Lee S,
Sodroski JSodroski J.. “Subunit stoichiometry of human “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Viroltrimers during virus entry into host cells “, J Virol.. 2006 2006
May;80(9):4388-95.May;80(9):4388-95.
1111
Use fieldsUse fields!!Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]
For the full list of field tags: go to help -> Search Field Descriptions and Tags
1212
ExerciseExercise
Retrieve all publications in which the Retrieve all publications in which the first first author is:author is: Pe'er I Pe'er I and the and the last author is:last author is: Shamir RShamir R
1313
Using LimitsUsing Limits
Retrieve the publications of Friedman N, in the journals: Bioinformatics and Journal of Computational Biology, in the last 5 years
1414
Google scholarGoogle scholarhttp://scholar.google.com/
1515
1616
NCBI gene & protein databases: NCBI gene & protein databases: GenBankGenBank
GenBankGenBank is an annotated collection of all is an annotated collection of all publicly available DNA sequences. publicly available DNA sequences.
Holds Holds 65 billion65 billion bases (Oct. 2007)bases (Oct. 2007)
GenPeptGenPept is a database of translated is a database of translated coding sequences from GenBankcoding sequences from GenBank
1717
Searching for CD4 human using Searching for CD4 human using EntrezEntrez
Search demonstrationSearch demonstration
1818
1919
Using Field Descriptions, Qualifiers, Using Field Descriptions, Qualifiers, and Boolean Operatorsand Boolean Operators
Cd4[GENE] AND human[ORGN] Cd4[GENE] AND human[ORGN] Or Or Cd4[gene name] AND human[organism]Cd4[gene name] AND human[organism]
List of field codes: List of field codes: httphttp://://wwwwww..ncbincbi..nlmnlm..nihnih..govgov//entrezentrez//queryquery//staticstatic//helphelp//Summary_MatricesSummary_Matrices..html#Search_Fields_and_Qualifiershtml#Search_Fields_and_Qualifiers
Boolean Operators:Boolean Operators:ANDANDORORNOTNOT
Note: do not use the field Protein name [PROT], only Note: do not use the field Protein name [PROT], only GENE!GENE!
2020
2121
RefSeqRefSeq REFSEQ: sub-collection of NCBI databases with REFSEQ: sub-collection of NCBI databases with
only non-redundant, highly annotated entries only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein (genomic DNA, transcript (RNA), and protein products)products)
2222
2323An explanation on GenBank records
2424
Accession NumbersAccession NumbersGenBankGenBank
EMBLEMBL
Two letters followed by six digits, e.g.:Two letters followed by six digits, e.g.:AY123456AY123456
One letter followed by five digits, eOne letter followed by five digits, e..gg.:.:U12345U12345
GenPept (a.a. GenPept (a.a. translations of translations of GenBank)GenBank)
Three letters and five digits, e.g.:Three letters and five digits, e.g.:AAA12345AAA12345
RefseqRefseqRefSeq accession numbers can be distinguished from RefSeq accession numbers can be distinguished from GenBank accessions by their prefix distinct format of GenBank accessions by their prefix distinct format of [[2 2 characters+underscorecharacters+underscore]], e.g.: , e.g.: NP_015325NP_015325..NM_: nucleotide, NP_: proteinNM_: nucleotide, NP_: protein
SWISSSWISS--PROTPROT
(another protein (another protein database)database)
All are six charactersAll are six characters::Character/FormatCharacter/Format1 [O,P,Q] 2 [0-9] 3 [A-Z,0-9] 4 [A-Z,0-9]1 [O,P,Q] 2 [0-9] 3 [A-Z,0-9] 4 [A-Z,0-9]5 [A-Z,0-9] 6 [0-9] 5 [A-Z,0-9] 6 [0-9] e.g.:e.g.:P12345P12345 and and Q9JJS7Q9JJS7
PDB (Protein Data PDB (Protein Data Bank – structure Bank – structure database)database)
one digit followed by three letters, eone digit followed by three letters, e..gg.:.:1hxw1hxw
2525
SwissprotSwissprot
A protein sequence database which A protein sequence database which strives to provide a high level of strives to provide a high level of annotation:annotation:* the function of a protein* the function of a protein* domains structure* domains structure* post* post--translational modificationstranslational modifications* variants* variants
One entry for each proteinOne entry for each protein
2626
2727
GenBank Vs. Swiss-ProtGenBank Vs. Swiss-Prot
GenBank results Swiss-Prot results
2828
Downloading & Fasta formatDownloading & Fasta format Fasta formatFasta format
> sp|P01730|CD4_HUMAN T-cell surface glycoprotein CD4 precursor MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI
Save Accession Numbers for future use (makes searching quicker):Refseq: NP_000607Swissprot: P01730
2929
3030
PDBPDB:: Protein Data Bank Protein Data Bank
Main database of 3D structures.Main database of 3D structures. Includes ~47,000 entries (Includes ~47,000 entries (proteinsproteins, ,
nucleic acids, others).nucleic acids, others). Proteins organized in groups, families etc.Proteins organized in groups, families etc. Is highly redundant.Is highly redundant. http://www.rcsb.orghttp://www.rcsb.org
3131
CD4 in complex with gp120CD4 in complex with gp120
gp120
CD4
PDB ID 1G9M
3232
Model organisms have independent database:Model organisms have independent database:
Organism specificOrganism specific
HIV database http://hiv-web.lanl.gov/content/index
3333
GenecardsGenecards
All in one database of human genes (a All in one database of human genes (a project by Weizmann institute) project by Weizmann institute)
Attempts to integrate as many as possible Attempts to integrate as many as possible databases, publications and all available databases, publications and all available knowledgeknowledge
httphttp://://wwwwww..genecardsgenecards..orgorg
3434
3535
SummarySummary
General and comprehensive databases:General and comprehensive databases: NCBI, EMBL, DDBJNCBI, EMBL, DDBJ
Genome specific databases:Genome specific databases: ENSEMBL, UCSC genome browserENSEMBL, UCSC genome browser
Highly annotated databases:Highly annotated databases: Human genesHuman genes
• Genecards Genecards Proteins:Proteins:
• Swissprot, RefseqSwissprot, Refseq Structures:Structures:
• PDBPDB
3636
The MOST important of allThe MOST important of all
1.1.GoogleGoogle (or any search engine) (or any search engine)
3737
And always rememberAnd always remember::
2.2.RT(F)MRT(F)M – –
Read the manual!!Read the manual!!
3838
HelpHelp!!
Read the Help sectionRead the Help section Read the FAQ sectionRead the FAQ section Google the question!Google the question!