+ All Categories
Home > Documents > Making Sense of Life Sciences Data

Making Sense of Life Sciences Data

Date post: 03-Jan-2016
Category:
Upload: datherine-cooper
View: 32 times
Download: 0 times
Share this document with a friend
Description:
Making Sense of Life Sciences Data. Nigel Martin. 21 st May 2008. Life Sciences Informatics. The development and use of computational methods for the acquisition management analysis and interpretation - PowerPoint PPT Presentation
20
Making Sense of Life Sciences Data Nigel Martin 21 st May 2008
Transcript
Page 1: Making Sense of Life Sciences Data

Making Sense of Life Sciences Data

Nigel Martin

21st May 2008

Page 2: Making Sense of Life Sciences Data

The development and use of computational methods for the• acquisition• management • analysis and • interpretation

of biological and medical information to determine biological functions and mechanisms as well as their applications in user communities

This biological and medical information is encoded in the vast amounts of data now generated in the life sciences e.g. dna data

Life Sciences Informatics

Page 3: Making Sense of Life Sciences Data

Life Sciences Informatics

CC AA CC CC TTGG ……

Page 4: Making Sense of Life Sciences Data

Life Sciences Informatics

CC AA CC CC TTGG ……

Homo sapiens

Page 5: Making Sense of Life Sciences Data

Genome (made of DNA)

RNA Protein

A gene

Gene expression

Permanent copy Temporary copy Product

FUNCTION

Job

BiologicalProcesses

Page 6: Making Sense of Life Sciences Data

• The primary data of DNA and protein sequences are held in large repositories such as the

EMBL Nucleotide Sequence Database

• The latest release contains 114,475,051 sequences comprising 215,540,553,360 nucleotides

• But life sciences data comprises of much besides sequence data…

Life Sciences Data is Complex

Page 7: Making Sense of Life Sciences Data

Life Sciences Data is Complex

• e.g. CATH protein structure classification

   

Page 8: Making Sense of Life Sciences Data

Life Sciences Data is Complex• e.g. herpesvirus evolutionary tree    

Page 9: Making Sense of Life Sciences Data

Life Sciences Data is Complex

• e.g. Kegg metabolic pathway

   

Page 10: Making Sense of Life Sciences Data

• e.g. PubMed medical abstract

Toxicol Appl Pharmacol. 2004 Dec 1;201(2):178-85.Related Articles, Links

  cDNA microarray analysis of rat alveolar epithelial cells following exposure to organic extract of diesel exhaust particles.

Koike E, Hirano S, Furuyama A, Kobayashi T.

Particulate Matter (PM2.5) and Diesel Exhaust Particles (DEP) Research Project, National Institute for Environmental Studies, Tsukuba, Ibaraki, 305-8506, Japan.

Diesel exhaust particles (DEP) induce pulmonary diseases including asthma and chronic bronchitis. Comprehensive evaluation is required to know the mechanisms underlying the effects of air pollutants including DEP on lung diseases. Using a cDNA microarray, we examined changes in gene expression in SV40T2 cells, a rat alveolar type II epithelial cell line, following exposure to an organic extract of DEP. We identified candidate sensitive genes that were up- or down-regulated in response to DEP. The cDNA microarray analysis revealed that a 6-h exposure to the DEP extract (30 mug/ml) increased (>2-fold) the expression of 51 genes associated with drug metabolism, antioxidation, cell cycle/proliferation/apoptosis, coagulation/fibrinolysis, and expressed sequence tags (ESTs), and decreased (<0.5-fold) that of 20 genes. In the present study, heme oxygenase (HO)-1, an antioxidative enzyme, showed the maximum increase in gene expression; and type II transglutaminase (TGM-2), a regulator of coagulation, showed the most prominent decrease among the genes. We confirmed the change in the HO-1 protein level by Western blot analysis and that in the enzyme activity of TGM-2. The organic extract of DEP increased the expression of HO-1 protein and decreased the enzyme activity of TGM-2. Furthermore, these effects of DEP on either HO-1 or TGM-2 were reduced by N-acetyl-l-cysteine (NAC), thus suggesting that oxidative stress caused by this organic fraction of DEP may have induced these cellular responses. Therefore, an increase in HO-1 and a decrease in TGM-2 might be good markers of the biological response to organic compounds of airborne particulate substances.

PMID: 15541757 [PubMed - in process]    

Life Sciences Data is Complex

Page 11: Making Sense of Life Sciences Data

• e.g. Gene Ontology http://www.geneontology.org/

  •     GO:0008150 : biological_process ( 109503 ) •     GO:0005575 : cellular_component ( 98453 ) •     GO:0003674 : molecular_function ( 108120 )  

•     GO:0016209 : antioxidant activity ( 478 ) •     GO:0005488 : binding ( 31317 ) •     GO:0003824 : catalytic activity ( 35260 ) •     GO:0030188 : chaperone regulator activity ( 14 ) •     GO:0030234 : enzyme regulator activity ( 2087 ) •     GO:0005554 : molecular_function unknown ( 29597 ) •     GO:0003774 : motor activity ( 522 ) •     GO:0045735 : nutrient reservoir activity ( 36 ) •     GO:0004871 : signal transducer activity ( 8356 ) •     GO:0005198 : structural molecule activity ( 3428 ) •     GO:0030528 : transcription regulator activity ( 8552 )  

•     GO:0017163 : negative regulator of basal transcription activity ( 15 ) •     GO:0003701 : RNA polymerase I transcription factor activity ( 31 ) •     GO:0003702 : RNA polymerase II transcription factor activity ( 982 ) •     GO:0003709 : RNA polymerase III transcription factor activity ( 41 ) •     GO:0030401 : transcription antiterminator activity ( 16 ) •     GO:0003712 : transcription cofactor activity ( 731 ) •     GO:0003700 : transcription factor activity ( 5510 ) •     GO:0016986 : transcription initiation factor activity ( 82 ) •     GO:0016988 : transcription initiation factor antagonist activity ( 9 ) •     GO:0003715 : transcription termination factor activity ( 38 ) •     GO:0016563 : transcriptional activator activity ( 499 ) •     GO:0003711 : transcriptional elongation regulator activity ( 97 ) •     GO:0016564 : transcriptional repressor activity ( 507 ) •     GO:0000156 : two-component response regulator activity ( 394 )

•     GO:0045182 : translation regulator activity ( 687 ) •     GO:0005215 : transporter activity ( 9054 ) •     GO:0030533 : triplet codon-amino acid adaptor activity ( 555 )

Life Sciences Data is Complex

Page 12: Making Sense of Life Sciences Data

Life Sciences Informatics in Birkbeck Comp Sci

• Evolutionary analysis: reconstruction of evolutionary events from genomic and related data

• Integration of life sciences data: data and knowledge management techniques to support the integration, analysis, mining and visualisation of life sciences data

• Medical informatics: data integration, semantic modelling, fuzzy inferencing and data mining techniques to support virtual integration of medical records

For full details of topics, people, projects, publications…

http://www.dcs.bbk.ac.uk/research/bioinf

Example Research Areas:

Page 13: Making Sense of Life Sciences Data

Evolutionary Analysis

• Annotating evolutionary trees

Mathematical models and algorithms addressingproblems such as:

• Given an evolutionary species tree and a set of trees built on the same extant species according to similarity between individual gene families, find a mapping of the individual gene trees onto the species tree exhibiting gene duplications and losses to account for the differences

• Given an evolutionary species tree and patterns of presence/absence of genes in the extant species, compute evolutionary scenarios of gene gain, horizantal transfer and loss events to account for the patterns

Page 14: Making Sense of Life Sciences Data

Evolutionary Analysis

• Applied to the analysis of evolutionary gains and loss of functions in herpesvirus genomes

Reconstructed history of HPF161 Host–virus interaction

Page 15: Making Sense of Life Sciences Data

Integration of Life Sciences Data

• Integrating transcriptomics and structural data to reveal protein functions: BioMap

• A data warehouse to support analysis and mining integrating data including microarray gene expression data, protein structure data, CATH structural classification data, functional data including Gene Ontology, KEGG (Gene, Orthology, Genome, Pathway…)

• Creation of a pilot Grid for proteomics resources: ISpider

• An integrated platform of proteomics resources supporting techniques for distributed querying, workflows and data analysis tasks in a Grid

• Research approach based on semantic mapping services using the techniques developed in the AutoMed project http://www.doc.ic.ac.uk/automed/

Page 16: Making Sense of Life Sciences Data

Existing Resources

PS

WS

PF

WS

TR

WS

GS

WS

FA

WS

PPI

WS

PID

WS

PRIDE

WS

PEDRo

WS

ISPIDER Resources

Integrated Proteomics Informatics Platform - Architecture

VanillaQuery Client

2D GelVisualisation

Client + Aspergil.Extensions

+ Phosph.Extensions

PPI Validation + Analysis

Client

Protein ID Client

ExistingE-ScienceInfrastructure

ISPIDERProteomics GridInfrastructure

ISPIDERProteomics Clients

PublicProteomicResources

myGridOntologyServices

myGridDQP

DASAutoMedmyGrid

Workflows

ProteomeRequestHandler

InstanceIdent/Mapping

Services

ProteomicOntologies/

Vocabularies

SourceSelectionServices

DataCleaningServices

Phos

WS

WP1

WP2

WP3

WP4

WP5

WP6

WP6

WP3

KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work PackageKEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package

Web services

Page 17: Making Sense of Life Sciences Data

Medical Informatics• ASASsociation sociation SStudies assisted by tudies assisted by IInference and nference and SSemanticemantic TTechnologies – ASSISTechnologies – ASSIST

• 10 E.U. partners: U.K., Greece, Belgium, Germany, Spain

The main objectives of ASSIST are to:

• Allow researchers to combine phenotypic and genotypic data

• Unify multiple patient records repositories

• Automate the process of evaluating medical hypotheses • Provide an inference engine capable of statistically evaluating medical data

• Offer expressive, graphical tools for medical researchers to post their queries.

Page 18: Making Sense of Life Sciences Data

Medical Informatics

AutoMedMetadataRepository

AUTh(Greece)

Charite(Germany)

Ghent(Belgium)

AutoMed transformation pathways

Virtual IntegratedRelational Schema

ChariteRelational Schema

GhentRelational Schema

AUThRelational Schema

AutoMedQuery Processor

SeRQL query

SeRQL result

Medical RulesRepository

(First-Order Logic)

IQL query

expandedIQL query

Virtual IntegratedOWL Schema

Web Interface

AutoMed Wrappers (JDBC/Grid Services)

• ASSIST query processing builds on AutoMed technology with integrated ontology and inference rules capabilities

Page 19: Making Sense of Life Sciences Data

Making Sense of Life Sciences Data

• Some areas of on-going and future research on-going and future research

• automated reasoning using ontologies and widerautomated reasoning using ontologies and wider domain knowledge domain knowledge

• evolutionary reconstruction exploiting domainevolutionary reconstruction exploiting domain knowledge knowledge

• analysis and mining of heterogeneous distributedanalysis and mining of heterogeneous distributed resources resources

• metrics for data integration qualitymetrics for data integration quality

• The overarching motivation is the potential to make The overarching motivation is the potential to make scientific discoveries that can improve quality of life scientific discoveries that can improve quality of life

Page 20: Making Sense of Life Sciences Data

Some Collaborators

Funding

http://www.dcs.bbk.ac.uk/research/bioinf

Further Information


Recommended