+ All Categories
Home > Documents > Biological Databases

Biological Databases

Date post: 14-Jan-2016
Category:
Upload: meryl
View: 88 times
Download: 4 times
Share this document with a friend
Description:
Biological Databases. Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado. What can be discovered about a gene by a database search?. A little or a lot, depending on the gene - PowerPoint PPT Presentation
Popular Tags:
41
Biological Databases Biological Databases Notes adapted from lecture Notes adapted from lecture notes of Dr. Larry Hunter at notes of Dr. Larry Hunter at the University of Colorado the University of Colorado
Transcript
Page 1: Biological Databases

Biological DatabasesBiological Databases

Notes adapted from lecture notes of Notes adapted from lecture notes of Dr. Larry Hunter at the University Dr. Larry Hunter at the University of Colorado of Colorado

Page 2: Biological Databases

What can be discovered What can be discovered about a gene by a database about a gene by a database

search?search? A little or a lot, depending on the geneA little or a lot, depending on the gene

Evolutionary informationEvolutionary information: homologous genes, : homologous genes, taxonomic distributions, allele frequencies, synteny, etc.taxonomic distributions, allele frequencies, synteny, etc.

Genomic informationGenomic information: chromosomal location, introns, : chromosomal location, introns, UTRs, regulatory regions, shared domains, etc.UTRs, regulatory regions, shared domains, etc.

Structural informationStructural information: associated protein structures, : associated protein structures, fold types, structural domainsfold types, structural domains

Expression informationExpression information: expression specific to : expression specific to particular tissues, developmental stages, phenotypes, particular tissues, developmental stages, phenotypes, diseases, etc.diseases, etc.

Functional informationFunctional information: enzymatic/molecular : enzymatic/molecular function, pathway/cellular role, localization, role in function, pathway/cellular role, localization, role in diseasesdiseases

Page 3: Biological Databases

Using a databaseUsing a database How to get information out of a database:How to get information out of a database:

Browsing: no targeted information to retrieveBrowsing: no targeted information to retrieve Search: looking for particular informationSearch: looking for particular information

Searching a database:Searching a database: Must have a key that identifies the element(s) Must have a key that identifies the element(s)

of the database that are of interest.of the database that are of interest. Name of geneName of gene Sequence of geneSequence of gene Other informationOther information

Helps to have particular Helps to have particular informational goalsinformational goals

Page 4: Biological Databases

Searching for informationSearching for informationabout genes and their about genes and their

productsproducts Gene and gene product databases are often Gene and gene product databases are often

organized by sequenceorganized by sequence Genomic sequence encodes all traits of an organism. Genomic sequence encodes all traits of an organism. Gene products are uniquely described by their Gene products are uniquely described by their

sequences.sequences. Similar sequences among biomolecules indicates Similar sequences among biomolecules indicates

both similar function and an evolutionary both similar function and an evolutionary relationship relationship

Macromolecular sequences provide biologically Macromolecular sequences provide biologically meaningful keys for searching databasesmeaningful keys for searching databases

Page 5: Biological Databases

Searching sequence Searching sequence databasesdatabases

Start from sequence, find information about itStart from sequence, find information about it Many kinds of input sequencesMany kinds of input sequences

Could be amino acid or nucleotide sequenceCould be amino acid or nucleotide sequence Genomic or mRNA/cDNA or protein sequenceGenomic or mRNA/cDNA or protein sequence Complete or fragmentary sequencesComplete or fragmentary sequences

Exact matches are rare (even uninteresting in Exact matches are rare (even uninteresting in many cases), so often goal is to retrieve a set many cases), so often goal is to retrieve a set of similar sequences.of similar sequences. Both small (mutations) and large (required for Both small (mutations) and large (required for

function) differences within “similar” can be function) differences within “similar” can be interesting.interesting.

Page 6: Biological Databases

What might we want What might we want to know about a sequence?to know about a sequence?

Is this sequence similar to any known genes? Is this sequence similar to any known genes? How close is the best match? Significance?How close is the best match? Significance?

What do we know about that gene?What do we know about that gene? Genomic (chromosomal location, allelic Genomic (chromosomal location, allelic

information, regulatory regions, etc.)information, regulatory regions, etc.) Structural (known structure? structural domains? Structural (known structure? structural domains?

etc.)etc.) Functional (molecular, cellular & disease)Functional (molecular, cellular & disease)

Evolutionary information: Evolutionary information: Is this gene found in other organisms? Is this gene found in other organisms? What is its taxonomic tree?What is its taxonomic tree?

Page 7: Biological Databases

NCBI and EntrezNCBI and Entrez

Page 8: Biological Databases

NCBI and EntrezNCBI and Entrez One of the most useful and comprehensive One of the most useful and comprehensive

sources of databases is the NCBI, part of the sources of databases is the NCBI, part of the National Library of Medicine.National Library of Medicine.

NCBI provides interesting summaries, NCBI provides interesting summaries, browsers for genome data, and search toolsbrowsers for genome data, and search tools

Entrez is their database search interfaceEntrez is their database search interfacehttp://www.ncbi.nlm.nih.gov/Entrezhttp://www.ncbi.nlm.nih.gov/Entrez

Can search on gene names, sequences, Can search on gene names, sequences, chromosomal location, diseases, keywords, ...chromosomal location, diseases, keywords, ...

Page 9: Biological Databases
Page 10: Biological Databases

BLAST: Searching with a BLAST: Searching with a sequencesequence

Goals is to find other sequences that are Goals is to find other sequences that are more similar to the query than would be more similar to the query than would be expected by chance (and therefore are expected by chance (and therefore are homologoushomologous).).

Can start with nucleotide or amino acid Can start with nucleotide or amino acid sequence, and search for either (or both)sequence, and search for either (or both)

Many optionsMany options E.g. ignore low information (repetitive) E.g. ignore low information (repetitive)

sequence, set significance critical valuesequence, set significance critical value Defaults are not always appropriateDefaults are not always appropriate: READ : READ

THE NCBI EDUCATION PAGES!THE NCBI EDUCATION PAGES!

Page 11: Biological Databases
Page 12: Biological Databases

Major Major choices:choices: TranslationTranslation DatabaseDatabase FiltersFilters RestrictionsRestrictions MatrixMatrix

Page 13: Biological Databases
Page 14: Biological Databases
Page 15: Biological Databases

Close hit: Rat ADH alphaClose hit: Rat ADH alpha

Page 16: Biological Databases

Distant hit:Distant hit:Human sorbitol Human sorbitol dehydrogenasedehydrogenase

Page 17: Biological Databases

Parameters (at bottom!)Parameters (at bottom!)

Page 18: Biological Databases

Click on:

Page 19: Biological Databases
Page 20: Biological Databases

Taxonomy reportTaxonomy report

(link from “Results of BLAST” page)

Page 21: Biological Databases

What did we just do?What did we just do?

Identify loci (genes) associated with the Identify loci (genes) associated with the sequence. sequence. Input was Alcohol Input was Alcohol DehydrogenaseDehydrogenase

For each particular “hit”, we can look at that For each particular “hit”, we can look at that sequence and its alignment in more detail.sequence and its alignment in more detail.

See similar sequences, and the organisms in See similar sequences, and the organisms in which they are found.which they are found.

But there’s But there’s much moremuch more that can be found that can be found on these genes, even just inside NCBI…on these genes, even just inside NCBI…

Page 22: Biological Databases
Page 23: Biological Databases
Page 24: Biological Databases
Page 25: Biological Databases

More from Entrez GeneMore from Entrez Gene

Page 26: Biological Databases

And more…And more…

Page 27: Biological Databases

PubMedPubMed

Page 28: Biological Databases
Page 29: Biological Databases

Gene Expression Gene Expression

Page 30: Biological Databases

Detailed expression Detailed expression informationinformation

Page 31: Biological Databases
Page 32: Biological Databases
Page 33: Biological Databases
Page 34: Biological Databases

NCBI is not all there is...NCBI is not all there is... Links to non-NCBI databasesLinks to non-NCBI databases

Reactome & KEGG for pathwaysReactome & KEGG for pathways HGNC for nomenclatureHGNC for nomenclature UCSC Human Genome BrowserUCSC Human Genome Browser

Other important gene/protein resources not Other important gene/protein resources not linked to:linked to: UniProt (most carefully annotated)UniProt (most carefully annotated) PDBPDB (main macromolecular structure repository) (main macromolecular structure repository)

Other key biological data sourcesOther key biological data sources Gene OntologyGene Ontology/Open Biological Ontologies/Open Biological Ontologies EnzymeEnzyme

Scientific society: iscb.orgScientific society: iscb.org Journals, Conferences…Journals, Conferences…

Page 35: Biological Databases
Page 36: Biological Databases
Page 37: Biological Databases

Gene Names: Gene Names: Harder than you think…Harder than you think…

Page 38: Biological Databases
Page 39: Biological Databases
Page 40: Biological Databases
Page 41: Biological Databases

Take home messagesTake home messages

There are a lot of molecular biology There are a lot of molecular biology databases, containing a lot of valuable databases, containing a lot of valuable informationinformation

Not even the best databases have Not even the best databases have everything (or the best of everything)everything (or the best of everything)

These databases are moderately well These databases are moderately well cross-linked, and there are “linker” cross-linked, and there are “linker” databasesdatabases

Sequence is a good identifier, maybe Sequence is a good identifier, maybe even better than gene name!even better than gene name!


Recommended