Integration of biomedical data and electronic publications

Post on 24-Jun-2015

236 views 2 download

Tags:

description

10th International Symposium on Electronic Theses and Dissertations, Uppsala University, Uppsala, Sweden, June 13-16, 2007

transcript

Integration of biomedical dataand electronic publications

Lars Juhl JensenEMBL Heidelberg

printed publications

dead wood

electronic publications

virtual dead wood

de Lichtenberg et al., Science, 2005

small font sizes

“no”

Jensen et al., Nature Reviews Genetics, 2006

small font sizes

hyperlinks

“no”

“hell no”

why?

archival

reanalysis

data mining

reader interaction

what?

raw data

processed data

final data

“facts”

where?

part of the document

too much data

too coarse grained

escalates the problem

institutional repositories

too many types of data

lack of standardization

difficult to download all data

public databases

specialization

standardization

mandatory deposition

easy to download all data

cross references

examples from biomedicine

GenBank

17.9 million sequences

80 billion nucleotides

UniProt

4.7 million sequences

Ensembl

35 complete genomes

PDB

44000 protein structures

GEO

5800 data sets

152000 samples

ArrayExpress

1800 data sets

BioGRID

186000 interactions

129000 proteins

MINT

103000 interactions

28000 proteins

PubChem

7.5 million compounds

PubMed Central

330 open access journals

12000 open access papers

downloadable

standardized formats

cross-referenced

archival

reanalysis

data mining

reader interaction

thank you!