+ All Categories
Home > Documents > Value-adding, Access, and Use: Biological Databases as a Case Study.

Value-adding, Access, and Use: Biological Databases as a Case Study.

Date post: 04-Jan-2016
Category:
Upload: alexina-moody
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
44
Value-adding, Access, and Use: Biological Databases as a Case Study
Transcript
Page 1: Value-adding, Access, and Use: Biological Databases as a Case Study.

Value-adding, Access, and Use: Biological Databases as a Case Study

Page 2: Value-adding, Access, and Use: Biological Databases as a Case Study.

Genes…..

Page 3: Value-adding, Access, and Use: Biological Databases as a Case Study.

…….make proteins

Page 4: Value-adding, Access, and Use: Biological Databases as a Case Study.

Proteins form complex 3D structures

Page 5: Value-adding, Access, and Use: Biological Databases as a Case Study.

Molecules interact

Page 6: Value-adding, Access, and Use: Biological Databases as a Case Study.

the right molecules need to bepresent at the right time

Page 7: Value-adding, Access, and Use: Biological Databases as a Case Study.
Page 8: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL-BankDNA sequences

Page 9: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

Page 10: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Page 11: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Array-ExpressMicroarray

Expression Data

Page 12: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Array-ExpressMicroarray

Expression Data

Page 13: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Array-ExpressMicroarray

Expression Data

EMSDMacromolecularStructure Data

Page 14: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Array-ExpressMicroarray

Expression Data

EMSDMacromolecularStructure Data

Page 15: Value-adding, Access, and Use: Biological Databases as a Case Study.

EnsEMBL

EMBL-BankDNA sequences

Array-ExpressMicroarray

Expression Data IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

EMSDMacromolecularStructure Data

Page 16: Value-adding, Access, and Use: Biological Databases as a Case Study.
Page 17: Value-adding, Access, and Use: Biological Databases as a Case Study.
Page 18: Value-adding, Access, and Use: Biological Databases as a Case Study.

Integr8

Page 19: Value-adding, Access, and Use: Biological Databases as a Case Study.

EnsEMBL

EMBL-BankDNA sequences

Array-ExpressMicroarray

Expression Data IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

EMSDMacromolecularStructure Data

Page 20: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL-BankDNA sequences

IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

Page 21: Value-adding, Access, and Use: Biological Databases as a Case Study.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DB

Sub

mit

ters

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 22: Value-adding, Access, and Use: Biological Databases as a Case Study.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DB

Sub

mit

ters

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Production DB

Page 23: Value-adding, Access, and Use: Biological Databases as a Case Study.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Production DB

Page 24: Value-adding, Access, and Use: Biological Databases as a Case Study.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 25: Value-adding, Access, and Use: Biological Databases as a Case Study.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 26: Value-adding, Access, and Use: Biological Databases as a Case Study.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 27: Value-adding, Access, and Use: Biological Databases as a Case Study.

Data Distrib.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 28: Value-adding, Access, and Use: Biological Databases as a Case Study.

Data Distrib.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value (review etc.)

Data exchange

Other archives

Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 29: Value-adding, Access, and Use: Biological Databases as a Case Study.

Data Distrib.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

DevelopmentDB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value (review etc.)

Data exchange

Other archives

Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 30: Value-adding, Access, and Use: Biological Databases as a Case Study.

Data Distrib.

Running a database project

Ser

vice

Too

ls

End

Use

rs

Service DBProduction DB

DevelopmentDB

Sub

mit

ters

Sub

mis

sion

tool

s

Add value(computation)

Add value (review etc.)

Data exchange

Other archives

Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 31: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL nucleotide sequence database

Page 32: Value-adding, Access, and Use: Biological Databases as a Case Study.

Dataflow

Page 33: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBLFlat File

ID SLD746 standard; DNA; PRO; 477 BP.XXAC D83746;XXNI g1772347XXDT 18-JAN-1997 (Rel. 50, Created)DT 17-FEB-1997 (Rel. 50, Last updated, Version 2)XXDE Streptomyces lividans DNA for ribosomal protein S12, complete cds.XXKW ribosomal protein S12.XXOS Streptomyces lividansOC Eubacteria; Firmicutes; Actinomycetes; Streptomycetes;OC Streptomycetaceae; Streptomyces.XXRN [1]RP 1-477RA Shima J.;RT ;RL Submitted (06-MAR-1996) to the EMBL/GenBank/DDBJ databases.RL Jun Shima, National Food Research Institute; Kannondai 2-1-2, Tsukuba,RL Ibaraki 305, Japan (E-mail:[email protected], Tel:0298-38-8124,RL Fax:0298-38-7996)XXDR SPTREMBL; P97222; P97222.XXFH Key Location/QualifiersFHFT source 1. .477FT /organism="Streptomyces lividans"FT /strain="TK21"FT CDS 28. .399FT /db_xref="PID:g1772348"FT /db_xref="SPTREMBL:P97222"FT /product="ribosomal protein S12"FT /translation="MPTIQQLVRKGRQDKVEKNKTPALEGSPQRRGVCTRVFTTTPKKPFT NSALRKVARVRLTSGIEVTAYIPGEGHNLQEHSIVLVRGGRVKDLPGVRYKIIRGSLDTFT QGVKNRKQARSRYGAKKEK"FT mutation 289FT /replace="g"FT /phenotype="streptomycin resistant mutant TK24"XXSQ Sequence 477 BP; 99 A; 153 C; 152 G; 73 T; 0 other; ATTCGGCACA CAGAAACCGG AGAAGTAGTG CCTACGATCC AGCAGCTGGT CCGGAAGGGC 60 CGGCAGGACA AGGTCGAGAA GAACAAGACG CCCGCACTCG AGGGTTCGCC CCAGCGCCGT 120 GGCGTCTGCA CGCGTGTGTT CACGACCACC CCGAAGAAGC CGAACTCGGC CCTGCGTAAG 180 GTCGCGCGTG TGCGTCTGAC CAGTGGGATC GAGGTCACCG CTTACATTCC GGGTGAGGGG 240 CACAACCTGC AGGAGCACTC CATCGTGCTC GTGCGCGGCG GCCGTGTGAA GGACCTGCCG 300 GGTGTTCGCT ACAAGATCAT CCGCGGTTCG CTTGACACCC AGGGTGTGAA GAACCGCAAG 360 CAGGCCCGCA GCCGCTACGG CGCCAAGAAG GAGAAGTAAG AATGCCTCGT AAGGGCCCCG 420 CCCCGAAGCG CCCGGTCATC ATCGACCCGG TCTACGGTTC TCCTCTGGTG ACCTCCC 477//

Page 34: Value-adding, Access, and Use: Biological Databases as a Case Study.

EMBL Relational Schema

VARI ATIONFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o FREQ UENCY o REPLACE * USERSTAM P * TI M ESTAM P

UNCLASSI FI EDFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o USERSTAM P o TI M ESTAM P

TRANSLATIONEXCEPTI ON @ PRDB1 (#)

* AM I NO ACI D * FEATI D * TRANSXEND# * TRANSXI D * TRANSXSTART o BI O SEQ I D * USERSTAM P * TI M ESTAM P

TRANSCRI PTFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o ENUM BER o FI VE_CO NS o PSEUDO o READI NG FRAM E o THREE_CO NS * USERSTAM P * TI M ESTAM P

THESIS @ PRDB1 (#)

* I NSTI TUTE## * PUBI D o ADVI SO R o DEG REE * USERSTAM P * TI M ESTAM P

SUBMI SSIONREF @ PRDB1 (#)

# * PUBI D o DS# o M EDI UM * USERSTAM P * TI M ESTAM P

SOURCEFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o O RG ANI SM o CHRO M O SO M E o FREQ UENCY o HAPLO TYPE o NUCSO URCE o O RG ANELLE o SEQ UENCED o SEX o VI RI O N * USERSTAM P * TI M ESTAM P

SI GNALFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o DI RECTI O N o PSEUDO * USERSTAM P * TI M ESTAM P

SEQFEATURE_NOTE @ PRDB1

* FEATI D * LI NE# * NO TE# * TEXT o USERSTAM P o TI M ESTAM P

SEQFEATURE @ PRDB1 (#)

* BI O SEQ I D# * FEATI D * FTYPE# * LO CATI O N * O RDER_I N o EVI DENCE o FEAT_LABEL o I NCO M PLETE o SUBM I TTO R * USERSTAM P * TI M ESTAM P

RPTUNI T @ PRDB1

* FEATI D o RPTI D o LABEL o RPTEND o RPTSTART * USERSTAM P * TI M ESTAM P

RNAFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o ANTI CO DO NAA o ANTI CO DO NEND o ANTI CO DO NSTART o PSEUDO * USERSTAM P * TI M ESTAM P

PUB_XREF @ PRDB1

* PUBI D * DBCO DE * PRI M ARYI D

PUBLICATION @ PRDB1 (#)

# * PUBI D * PUBLANG * PUBSTATUS * PUBTYPE o PUBDATE o TI TLE * USERSTAM P * TI M ESTAM P

PUBAUTHOR @ PRDB1

* O RDERI N * PERSO N * PUBI D o EDI TO RFLAG * USERSTAM P * TI M ESTAM P

PROTEINSEQ @ PRDB1 (#)

# * SEQ I D o DERI VED o M O LW EI G HT * USERSTAM P * TI M ESTAM P

PROTEINCODINGFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# * PRO TEI NSEQ I D o ENUM BER o PSEUDO o READI NG FRAM E o TRANSL_TABLE_I D * USERSTAM P * TI M ESTAM P

PHYSICALSEQ @ PRDB1 (#)

# * PHYSEQ I D * SEQ TEXT * USERSTAM P * TI M ESTAM P

PERSONALCOMM @ PRDB1

* ADDRESS * PUBI D * RECI PI ENT o PHO NEI D * USERSTAM P * TI M ESTAM P

PERSON @ PRDB1 (#)

# * PERSO NI D * SURNAM E o FI RSTNAM E o M I DI NI TI ALS * USERSTAM P * TI M ESTAM P

PATENT_BIOSEQ @ PRDB1 (#)

* O RDERI N# * PUBI D# * SEQ I D * USERSTAM P * TI M ESTAM P

PATENTPRIORITY @ PRDB1

* PRI O RI TY_DATE * PRI O RI TY_NO * PRI O RI TY_O FFI CE * PRI O RI TY_O RDER * PUBI D * USERSTAM P * TI M ESTAM P

PATENTCLASS @ PRDB1

* CLASS * CLASS_O RDER * PUBI D * USERSTAM P * TI M ESTAM P

PATENTAPPLICANT @ PRDB1

* APPNAM E * O RDERI N * PUBI D * USERSTAM P * TI M ESTAM P

PATENTABSTRACT @ PRDB1 (#)

* ABSTRACT# * PUBI D * USERSTAM P * TI M ESTAM P

PATENT @ PRDB1 (#)

* DO CNUM * DO CO FFI CE * DO CTYPE# * PUBI D o APPDATE o APPNUM o APPO FFI CE * USERSTAM P * TI M ESTAM P

NUCSTRUCTUREFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o ENUM BER o FREQ UENCY o M O DBASE o O RG ANI SM o RPTFAM I LY * USERSTAM P * TI M ESTAM P

NUCSEQ @ PRDB1 (#)

* A_CO UNT * C_CO UNT * G _CO UNT * O THER_CO UNT# * SEQ I D * T_CO UNT o STRAND o TO PO LO G Y * USERSTAM P * TI M ESTAM P

NTX_TAX_NODE @ PRDB1 (#)

# * TAX_I D o PARENT_I D o RANK o EM BL_CO DE o DI V_I D * I NHERI T_DI V_I D o G C_I D o I NHERI T_G C_I D o M G C_I D o I NHERI T_M G C_I D * HI DDEN o NO _SEQ UENCE o REM ARK

NTX_SYNONYM @ PRDB1

* TAX_I D * NAM E_TXT o UNI Q UE_NAM E o NAM E_CLASS * UPPER_NAM E_TXT

NTX_RANK @ PRDB1 (#)

o RANK_I D# * RANK_TXT

NTX_CLASS @ PRDB1 (#)

o CLASS_CO DE# * CLASS_TEXT o PRI O RI TY

LOCATION_TREE @ PRDB1 (#)

# * LO CNO DEI D * O RDER_I N o CO M PLEM ENT o END_FSI ZE o END_FTYPE o G AP_FSI ZE o G AP_FTYPE o LI TERAL o O PERATO R o PARENTI D o REPLACE o REPL_STRI NG o SEG _END o SEG _G AP o SEG _START * SEQ I D o START_FSI ZE o START_FTYPE * USERSTAM P * TI M ESTAM P

KEYWORD_SYNONYM @ PRDB1 (#)

# * KEYW O RDI D1# * KEYW O RDI D2

KEYWORD @ PRDB1 (#)

* CO M PRESSED_KW * KEYW O RD# * KEYW O RDI D o DBCO DE o DESCRI PTI O N * USERSTAM P * TI M ESTAM P

JOURNALARTI CLE @ PRDB1 (#)

* FI RSTPAG E * I SSN# * PUBI D * VO LUM E o ARTI CLETYPE o I SSUE o LASTPAG E o O RDER_O N_PAG E o SUPPLEM ENT * USERSTAM P * TI M ESTAM P

INSTITUTE @ PRDB1 (#)

# * I NSTI TUTE# * I NSTI TUTE_NAM E * USERSTAM P * TI M ESTAM P

IMMUNOFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o CHI M ERI C o PSEUDO o READI NG FRAM E o TRANSL_TABLE_I D * USERSTAM P * TI M ESTAM P

GENE @ PRDB1 (#)

# * G ENEI D * G ENENAM E o DBCO DE o EXTDBI D o O RG ANI SM o PATHO LO G Y o PRO DUCT * USERSTAM P * TI M ESTAM P

FEATURE_RELATIONSHIP @ PRDB1 (#)

# * FEATI D1# * FEATI D2# * RELATI O N * USERSTAM P * TI M ESTAM P

FEATURE_QUALIFIERS @ PRDB1 (#)

# * FEATI D# * O RDER_O N * Q UAL# o TEXT o USERSTAM P o TI M ESTAM P

ERROR_QUALIFI ERS @ PRDB1 (#)

# * FEATI D# * O RDER_O N

DBENTRY_KEYWORD @ PRDB1

* KEYW O RDI D * DBENTRYI D o USERSTAM P o TI M ESTAM P

DBENTRY_DESCR @ PRDB1 (#)

# * DBENTRYI D# * LI NE# * TEXT o USERSTAM P o TI M ESTAM P

DBENTRY_COMMENT @ PRDB1 (#)

# * CO M M ENTI D# * DBENTRYI D * USERSTAM P * TI M ESTAM P

DBENTRY @ PRDB1 (#)

* BI O SEQ I D# * DBENTRYI D * ENTRY_NAM E * ENTRY_STATUS * PRI M ARYACC# * VERSI O N# o ANN_DATE o CLEAN_LI STI NG o CO NFI DENTI AL o DBCO DE o EXT_DATE o EXT_VER o FI RST_CREATED o FI RST_PUBLI C o HO LD_DATE o M I SSI NG _PAPER o PRO JECT# o SUBM I T_TO O L o W AI T_FO R_PAPER * USERSTAM P * TI M ESTAM P o FFDATE

DATABASE_XREF @ PRDB1

* EBI _DB * ACC# o NI D_TEXT o PI D_TEXT * EXT_DB * PRI M ARYI D o SECO NDARYI D

COMMENT_TEXT @ PRDB1 (#)

# * CO M M ENTI D# * LI NE# * TO PI CTYPE o PRI VATE o TEXT. . .

CODONEXCEPTION @ PRDB1 (#)

* AM I NO ACI D * CO DO NSEQ# * CO DO NXI D * FEATI D o USERSTAM P o TI M ESTAM P

CI TATI ONSEQFEATURE @ PRDB1 (#)

# * FEATI D# * PUBI D# * SEQ I D * USERSTAM P * TI M ESTAM P

CI TATI ONBIOSEQ @ PRDB1 (#)

* O RDERI N# * PUBI D# * SEQ I D o CI TCO M M ENT o FULLSEQ o LO CNO DEI D o LO CTYPE * USERSTAM P * TI M ESTAM P

BOOK @ PRDB1 (#)

* BO O KTI TLE * FI RSTPAG E * LASTPAG E# * PUBI D * PUBLI SHER o EDI TI O N o I SBN o PUBPLACE o SERI ES o VO LUM E * USERSTAM P * TI M ESTAM P

BI OSEQ @ PRDB1 (#)

* BI O SEQ TYPE * CHKSUM * M O LECULETYPE# * SEQ I D * SEQ LEN o DDBJSI D o EBI SI D o LO G SEQ o NCBI G I o PHYSEQ * USERSTAM P * TI M ESTAM P

ACCPAI R @ PRDB1 (#)

# * PRI M ARY# * SECO NDARY * USERSTAM P * TI M ESTAM P

ACCEPTED @ PRDB1 (#)

* I SSN# * PUBI D o ARTI CLETYPE o FI RSTPAG E o I SSUE o LASTPAG E o O RDER_O N_PAG E o SUPPLEM ENT o VO LUM E * USERSTAM P * TI M ESTAM P

FK_ACCFK_ACCPAIR_127

FK_BIOSEQ_4

FK_BIOSEQ_5

FK_BOOK_59

FK_CITATIONBIOSEQ_60

FK_CITATIONBIOSEQ_61

FK_CITATIONSEQFEATURE_10

FK_CITATIONSEQFEATURE_92

FK_CODONEXCEPTI ON_50

FK_DBENTRY_25

FK_DBENTRY_COMMENT_100

FK_DBENTRY_DESCR_117

FK_DBENTRY_KEYWORD_39

FK_DBENTRY_KEYWORD_40

FK_ERROR_QUALIFIERS_122

FK_FEATURE_QUALI FI ERS_121

FK_FEATURE_RELATI ONSHI P_28

FK_FEATURE_RELATI ONSHI P_29

FK_GENE_64

FK_I MMUNOFEATURE_7

FK_JOURNALARTI CLE_66

FK_LOCATI ON_TREE_68

FK_LOCATI ON_TREE_72

FK_NAME_CLASS

FK_NTX_SYNONYM_46

FK_NUCSEQ_76

FK_NUCSTRUCTUREFEATURE_55

FK_NUCSTRUCTUREFEATURE_57

FK_PATENTABSTRACT_1

FK_PATENTAPPLICANT_2FK_PATENTCLASS_98

FK_PATENTPRI ORITY_97

FK_PATENT_77

FK_PATENT_BI OSEQ_107

FK_PATENT_BI OSEQ_108

FK_PERSONALCOMM_3

FK_PROTEI NCODINGFEATURE_12

FK_PROTEI NCODINGFEATURE_13

FK_PROTEI NSEQ_80

FK_PROTEI NSEQ_81

FK_PUBAUTHOR_82

FK_PUBAUTHOR_83FK_PUB_XREF_1

FK_RANK

FK_RNAFEATURE_15

FK_RPTUNIT_109

FK_SEQFEATURE_35

FK_SEQFEATURE_36

FK_SEQFEATURE_NOTE_78

FK_SIGNALFEATURE_18

FK_SOURCEFEATURE_20

FK_SOURCEFEATURE_21

FK_SUBMISSI ONREF_119

FK_THESI S_106

FK_THESI S_84

FK_THESI S_85

FK_TRANSCRIPTFEATURE_52

FK_TRANSLATI ONEXCEPTION_110

FK_TRANSLATI ONEXCEPTION_51

FK_UNCLASSIFIEDFEATURE_118

FK_UNPUBLI SHED_114

FK_VARIATI ONFEATURE_48

Location Info

Feature Info

Taxonomy Info

Reference InfoSequence Info

Page 35: Value-adding, Access, and Use: Biological Databases as a Case Study.

Data Access and Use Network services Sequence Retrieval System (SRS)

integrating and linking the main nucleotide and protein databases plus many specialized databases

Database releases are produced quarterly- via FTP (inc. mirror sites) and CD-ROM

Daily and cumulative updates via FTP Sequence search servers

Page 36: Value-adding, Access, and Use: Biological Databases as a Case Study.

April 2003: TrEMBL 23.4 + SWISS-PROT 41.2 829,111 TrEMBL entries 123,721 SWISS-PROT entries weekly production of a non-redundant and

comprehensive protein sequence database consisting of SWISS-PROT, TrEMBL, and TrEMBLnew: ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb/

Page 37: Value-adding, Access, and Use: Biological Databases as a Case Study.

Goals High level of annotation Minimal redundancy High level of integration with other

databases Complete and up-to-date Availability

Page 38: Value-adding, Access, and Use: Biological Databases as a Case Study.

Growth of TrEMBL and SWISS-PROT

0

100

200

300

400

500

600

700

800

900

Nov-96 May-97 Nov-97 May-98 Nov-98 May-99 Nov-99 May-00 Nov-00 May-01 Nov-01 May-02

Publication Date

En

trie

s in

10

00

SWISS-PROT TrEMBL

Page 39: Value-adding, Access, and Use: Biological Databases as a Case Study.

Automatic annotation of TrEMBL

Data-mining to extract conditions from InterPro

Extract SWISS-PROT reference entries fulfilling the conditions

Extract common annotation Store conditions and common

annotation in RuleBase Group TrEMBL by conditions Add common annotation to

TrEMBL

TrEMBLTrEMBL

InterProInterPro

RuleBasRuleBasee

SWISS-PROT

Page 40: Value-adding, Access, and Use: Biological Databases as a Case Study.

Cross-referencesDomains, functionalsites, protein familiesInterProPROSITEPfamPRINTSProDomSMART

Nucleotide sequence dbEMBL, GenBank, DDBJ

3D/Structural dbsHSSPPDB

Organism-spec.dbsDictyDbEcoGeneFlyBaseHIVLepromaMaizeDBMGDMypuListSGDStyGeneSubtiListTIGRTubercuListWormPepYEPDZfin

Protein-specificdbsGCRDbMEROPSREBASETRANSFAC

SWISS-PROT/

TrEMBL2D-gel protein dbsSWISS-2DPAGEANU-2DPAGECOMPLUYEAST-2DPAGEECO2DBASEHSC-2DPAGEAarhus and GhentMAIZE-2DPAGEPHCI-2DPAGEPMMA-2DPAGESiena-2DPAGE

Human diseasesMIM

PTMCarbBankGlycoSuiteDB

Page 41: Value-adding, Access, and Use: Biological Databases as a Case Study.

TrEMBL

UniProt Archive

EnsEMBL PDB PatentData

DDBJ/EMBL/

GenBank PIR

UniProt Knowledgebase:TrEMBL + SWISS-PROT

UniProt NREF100

UniProt NREF90

UniProtNREF50

SWISS-PROT

OtherData…

Classification

Automated Annotation Literature Based Annotation

RefSeq

Page 42: Value-adding, Access, and Use: Biological Databases as a Case Study.

Funding

EMBL European Commission NIH Industrial licenses MRC IUPHAR

Page 43: Value-adding, Access, and Use: Biological Databases as a Case Study.
Page 44: Value-adding, Access, and Use: Biological Databases as a Case Study.

SWISS-PROT, TrEMBL, InterPro, etc, at EBI and SIB •Group leaders: Rolf Apweiler, Amos Bairoch

•Co-ordinators:Wolfgang Fleischmann, Henning Hermjakob, Michele Magrane, Maria-Jesus Martin, Nicola Mulder, Claire O’Donovan, Manuela Pruess

•Annotators/curators: Philippe Aldebert, Andrea Auchincloss, Kirsty Bates, Marie-Claude Blatter Garin, Brigitte Boeckmann, Silvia Braconi Quintaj, Paul Browne, Evelyn Camon, Danielle Coral, Elisabeth Coudert, Tania de Oliveria Lima, Kirill Degtyarenko, Sylvie Dethiollaz, Ann Estreicher, Livia Famiglietti, Nathalie Farriol-Mathis, Stephanie Federico, Serenella Ferro, Gill Fraser, Raffaella Gatto, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Vivien Junker, Youla Karavidopoulou, Maria Krestyaninova, Kati Laiho, Minna Lehvaslaiho, Karine Michoud, Virginie Mittard, Madelaine Moinat, Sandra Orchard, Sandrine Pilbout, Sylvain Poux, Sorogini Reynaud, Catherine Rivoire, Bernd Röchert, Michel Schneider, Christian Sigrist, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Sandra van den Broek, Bob Vaughan, Eleanor Whitfield

•Programmers: Daniel Barrell, David Binns, Michael Darsow, Ujjwal Das, Eduardo de Castro, Alexander Fedotov, Astrid Fleischmann, Elisabeth Gasteiger, Alain Gateau, Andre Hackmann, Ivan Ivanyi, Eric Jain, Alexander Kanapin, Paul Kersey, Ernst Kretschmann, Corinne Lachaize, Chris Lewington, Xavier Martin, John Maslen, Peter McLaren, Rupinder Singh Mazara, Lorna Morris, John O’Rourke, Isabelle Phan, Astrid Rakow, Kai Runte, Florence Servant, Allyson Williams, Dan Wu

•Research staff: Kristian Axelsen, Pierre-Alain Binz, Nicolas Hulo, Anne-Lise Veuthey

•Clerical/secretarial assistance: Veronique Mangold, Claudia Sapsezian, Margaret Shore-Nye, Veronique Verbegue

•Students: Pavel Dobrokhotov, Alexandre Gattiker, various MCF, etc


Recommended