+ All Categories
Home > Documents > 24 August 2012Ganesha Associates1 Basic reading, writing and informatics skills for biomedical...

24 August 2012Ganesha Associates1 Basic reading, writing and informatics skills for biomedical...

Date post: 26-Dec-2015
Category:
Upload: marcia-norman
View: 215 times
Download: 0 times
Share this document with a friend
31
24 August 2012 Ganesha Associates 1 Basic reading, writing and informatics skills for biomedical research Segment 4. Other types of database and browser
Transcript

Ganesha Associates 124 August 2012

Basic reading, writing and informatics skills for biomedical

researchSegment 4. Other types of

database and browser

Ganesha Associates 224 August 2012

Biological databases• A database is an indexed collection of information• Some databases contain mainly text, but others contain image,

sequence or structural data• A browser is a means of visualising this information and the

relationships between data elements• There is a growing amount of information in publicly available

databases. • For example, in 2011 the Nucleic Acids Research journal online

Molecular Biology Database Collection listed 1380.• The National Center for Biotechnology Information (NCBI) and the

European Bioinformatics Institute(EBI) host some of the most important databases used for biomedical research.

• Wikipedia also contains a list of biological databases• Which databases are relevant to your project?

Ganesha Associates 3

Data, data everywhere…• “Rapid release of prepublication data has served the

field of genomics well.”• “With close to one million gene-expression data sets now

in publicly accessible repositories, researchers can identify disease trends without ever having to enter a laboratory.”

• “Most researchers agree that open access to data is the scientific ideal, so what is stopping it happening [in other fields]?”

• “Earth scientists need better incentives, rewards and mechanisms to achieve free and open data exchange”

24 August 2012

Ganesha Associates 424 August 2012

The database problem

• Volume of digital data (both high throughput and text)– One second of HD video = 2000 pages of text

• Distributed systems and databases, lack of data standards, incompatible data formats

• Costs of creation, curation and maintenance• Retrieval: semantic search, metadata, images…

Ganesha Associates 524 August 2012

The problem – biomedical research

Gene ExpressionWarehouse

ProteinDisease

SNP

Enzyme

Pathway

Known Gene

SequenceCluster

Affy Fragment

Sequence

LocusLink

MGD

ExPASySwissProt

PDBOMIM

NCBIdbSNP

ExPASyEnzyme

KEGG

SPAD

UniGene

Genbank

NMR

Metabolite

Ganesha Associates 624 August 2012

Cross-database search today - NCBI

Ganesha Associates 724 August 2012

The problem – biomedical research

Ganesha Associates 824 August 2012

The problem – biomedical research

Ganesha Associates 924 August 2012

The problem – healthcare

Ganesha Associates 1024 August 2012

The problem - healthcare JOURNAL of the AMERICAN MEDICAL ASSOCIATION (JAMA) Vol 284, No 4, July

26th 2000

• 2,000 deaths/year from unnecessary surgery• 7,000 deaths/year from medication errors in hospitals• 20,000 deaths/year from other errors in hospitals• 80,000 deaths/year from infections in hospitals• 106,000 deaths/year from non-error, adverse effects of medications

These total up to 225,000 deaths per year in the US from iatrogenic causes which ranks these deaths as the # 3 killer.

Iatrogenic is a term used when a patient dies as a direct result of treatments by a physician, whether it is from misdiagnosis of the ailment or from adverse drug reactions used to treat the illness (drug reactions are the most common cause).

Ganesha Associates 1124 August 2012

The problem - healthcare• 17 year innovation adoption curve from discovery into

accepted standards of practice• Even if a standard is accepted, patients have a 50:50

chance of receiving appropriate care, a 5-10% probability of incurring a preventable, anticipatable adverse event

• Medical literature doubling every 19 years– Doubles every 22 months for AIDS care

• 2 million facts needed to practice • Genomics and personalized medicine will increase the

problem exponentially• Typical drug order today with decision support accounts

for, at best, Age, Weight, Height, Labs, Other Active Meds, Allergies, Diagnoses

Ganesha Associates 1224 August 2012

So how will we find things in databases ?

• Search engine collects, indexes, parses, and stores data to facilitate fast and accurate information retrieval.

• Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics (statistics), informatics, physics and computer science.

Ganesha Associates 2224 August 2012

Semantic levels

Definition Synonyms Classification (is_a)

Properties (has_a)

Other relations

Keywords

Dictionary

Controlled vocabulary

Thesaurus

Taxonomy

Ontology

Ganesha Associates 2724 August 2012

The Gene Ontology organisation

• The objective of GO is to provide controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products.

• These terms are to be used as attributes of gene products by collaborating databases, facilitating uniform queries across them.

• The controlled vocabularies of terms are structured to allow both attribution and querying to be at different levels of granularity.

• http://www.geneontology.org

Ganesha Associates 2924 August 2012

Ganesha Associates 3324 August 2012

Mitochondrial P450 (CC24 PR01238; MITP450CC24)

An example of annotation

GO cellular component term:mitochondrial inner membrane ; GO:0005743

GO molecular function term:monooxygenase activity ; GO:0004497

GO biological process term:electron transport ; GO:0006118

Ganesha Associates 3524 August 2012 attacked

time

control

Puparial adhesionMolting cyclehemocyanin

Defense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genes

Immune responseToll regulated genes

Amino acid catabolismLipid metobolism

Peptidase activityProtein catabloismImmune response

Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

MicroArray data analysis with GO

Ganesha Associates 3624 August 2012

GoPubMed

• GoPubMed is a knowledge-based search engine for biomedical texts. The Gene Ontology (GO) and Medical Subject Headings (MeSH) serve as "Table of contents" in order to structure the millions of articles of the MEDLINE data base.

• GoPubMed is one of the first Web 2.0 search engines.

• The system was developed at the Technical University of Dresden by Michael Schroeder and his team and at Transinsight.

• http://www.gopubmed.org

Ganesha Associates 3724 August 2012

Ganesha Associates 3824 August 2012

Medline CognitionCognition's Semantic NLP Understands:

Word stems - the roots of words; Words/Phrases - with individual meanings of ambiguous words and phrases listed out; The morphological properties of each word/phrase, e.g., what type of plural does it take, what type of past tense, how does it combine with affixes like "re" and "ation"; How to disambiguate word senses - This allows Cognition's technology to pick the correct word meaning of ambiguous words in context; The synonym relations between word meanings; The ontological relations between word meanings; one can think of this as a hierarchical grouping of meanings or a gigantic "family tree of English" with mothers, daughters, and cousins; The syntactic and semantic properties of words. This is particularly useful with verbs, for example. Cognition encodes the types of objects different verb meanings can occur with.

Ganesha Associates 3924 August 2012

Ganesha Associates 4024 August 2012

iHOP

Information Hyperlinked over Proteins. iHOP provides the network of genes and proteins as a natural way of accessing the millions of abstracts in PubMed

Ganesha Associates 4124 August 2012

iHOP• The minimal information view contains general

information, like the symbol, name and organism of a gene. Moreover it provides: – Useful links to external resources (e.g. UniProt, NCBI, OMIM,

etc.) – Links to other iHOP views on this gene – Homologues

• Other views contain all sentences found in the literature:– For the main gene of a page and other genes (gene B) which

iteract. – That mention the main gene together with relevant biomedical

terms such as lymphoma. • Sentences are ranked by significance, so that screening

over a few sentences will be usually sufficient to gain an idea of a gene's function.

Ganesha Associates 4224 August 2012

Ganesha Associates 4324 August 2012

GenMAPP

• GenMAPP is a free computer application designed to visualize gene expression and other genomic data on maps representing biological pathways and groupings of genes.

• Integrated with GenMAPP are programs to perform a global analysis of gene expression or genomic data in the context of hundreds of pathway MAPPs and thousands of Gene Ontology Terms.

Ganesha Associates 4424 August 2012

Automatic rendering of pathway interactions

Ganesha Associates 4524 August 2012

Other ways to search – BLAST, PubChem, UCSC Genome Browser

>DinoDNA from JURASSIC PARK p. 103 nt 1-1200GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACGGACGTGTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCCATGGAGTTCGTGGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAAGCCGGAGCCTTCCTGGGGCTGGGGGGGGGCG

By sequence – BLAST:

By structure – PubChem:

Ganesha Associates 4624 August 2012

Example of BLAST search results

Ganesha Associates 4724 August 2012

PC Compound Record

Ganesha Associates 4824 August 2012

UCSC Genome Browser• The Genome Browser zooms and scrolls over

chromosomes, showing the work of annotators worldwide.

• The Gene Sorter shows expression, homology and other information on groups of genes that can be related in many ways.

• Blat quickly maps your sequence to the genome. The Table Browser provides convenient access to the underlying database.

• VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns.

• Genome Graphs allows you to upload and display genome-wide data sets.

Ganesha Associates 4924 August 2012


Recommended