+ All Categories
Home > Documents > Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477...

Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477...

Date post: 01-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
44
Value-adding, Access, and Use: Biological Databases as a Case Study
Transcript
Page 1: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Value-adding, Access, and Use: Biological Databases as a Case Study

Page 2: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Genes…..

Page 3: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

…….make proteins

Page 4: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Proteins form complex 3D structures

Page 5: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Moleculesinteract

Page 6: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

the right molecules need to bepresent at the right time

Page 7: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997
Page 8: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EMBL-BankDNA sequences

Page 9: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

Page 10: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EnsEMBLMetazoan GenomeGene Annotation

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

Page 11: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EMBL-BankDNA sequences

Array-ExpressMicroarray

Expression Data

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Page 12: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EMBL-BankDNA sequences

Array-ExpressMicroarray

Expression Data

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Page 13: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EMBL-BankDNA sequences

EMSDMacromolecularStructure Data

Array-ExpressMicroarray

Expression Data

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Page 14: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EMBL-BankDNA sequences

EMSDMacromolecularStructure Data

Array-ExpressMicroarray

Expression Data

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Page 15: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EnsEMBL

IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

EMSDMacromolecularStructure Data

EMBL-BankDNA sequences

Array-ExpressMicroarray

Expression Data

Page 16: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997
Page 17: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997
Page 18: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Integr8

Page 19: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EnsEMBL

IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

EMSDMacromolecularStructure Data

EMBL-BankDNA sequences

Array-ExpressMicroarray

Expression Data

Page 20: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

EMBL-BankDNA sequences

Page 21: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DB

Subm

itter

s

Add value(computation)

Add value (review etc.)Q/C etc

Releases&Updates

GenomesGenesPatentsUpdates

Databasedesign

Page 22: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DB

Subm

itter

s

Q/C etc

GenomesGenesPatentsUpdates

Production DB

Databasedesign

Add value(computation) Releases

&UpdatesAdd value (review etc.)

Page 23: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DB

Subm

itter

s

Subm

issi

on to

ols

Add value (review etc.)Q/C etc

GenomesGenesPatentsUpdates

Production DB

Databasedesign

Add value(computation) Releases

&Updates

Page 24: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

GenomesGenesPatentsUpdates

Databasedesign

Releases&Updates

Page 25: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

GenomesGenesPatentsUpdates

Databasedesign

Releases&Updates

Page 26: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

GenomesGenesPatentsUpdates

Databasedesign

Releases&Updates

Page 27: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Data Distrib.

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 28: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Data

.

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value (review etc.)Q/C etc

GenomesGenesPatentsUpdates

Other archives

Data exchange

Databasedesign

Distrib Releases&Updates

Page 29: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Data

.

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

DevelopmentDB

Subm

itter

s

Subm

issi

on to

ols

Add value (review etc.)

Data exchange

Q/C etc

Databasedesign

GenomesGenesPatentsUpdates

Other archives

Distrib Releases&Updates

Page 30: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Running a database project

Data Distrib.

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

DevelopmentDB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)

Data exchange

Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Other archives

Page 31: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EMBL nucleotide sequence database

Page 32: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Dataflow

Page 33: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

ID SLD746 standard; DNA; PRO; 477 BP.XXAC D83746;XXNI g1772347XXDT 18-JAN-1997 (Rel. 50, Created)DT 17-FEB-1997 (Rel. 50, Last updated, Version 2)XXDE Streptomyces lividans DNA for ribosomal protein S12, complete cds.XXKW ribosomal protein S12.XXOS Streptomyces lividansOC Eubacteria; Firmicutes; Actinomycetes; Streptomycetes;OC Streptomycetaceae; Streptomyces.XXRN [1]RP 1-477RA Shima J.;RT ;RL Submitted (06-MAR-1996) to the EMBL/GenBank/DDBJ databases.RL Jun Shima, National Food Research Institute; Kannondai 2-1-2, Tsukuba,RL Ibaraki 305, Japan (E-mail:[email protected], Tel:0298-38-8124,RL Fax:0298-38-7996)XXDR SPTREMBL; P97222; P97222.XXFH Key Location/QualifiersFHFT source 1. .477FT /organism="Streptomyces lividans"FT /strain="TK21"FT CDS 28. .399FT /db_xref="PID:g1772348"FT /db_xref="SPTREMBL:P97222"FT /product="ribosomal protein S12"FT /translation="MPTIQQLVRKGRQDKVEKNKTPALEGSPQRRGVCTRVFTTTPKKPFT NSALRKVARVRLTSGIEVTAYIPGEGHNLQEHSIVLVRGGRVKDLPGVRYKIIRGSLDTFT QGVKNRKQARSRYGAKKEK"FT mutation 289FT /replace="g"FT /phenotype="streptomycin resistant mutant TK24"XXSQ Sequence 477 BP; 99 A; 153 C; 152 G; 73 T; 0 other; ATTCGGCACA CAGAAACCGG AGAAGTAGTG CCTACGATCC AGCAGCTGGT CCGGAAGGGC 60 CGGCAGGACA AGGTCGAGAA GAACAAGACG CCCGCACTCG AGGGTTCGCC CCAGCGCCGT 120 GGCGTCTGCA CGCGTGTGTT CACGACCACC CCGAAGAAGC CGAACTCGGC CCTGCGTAAG 180 GTCGCGCGTG TGCGTCTGAC CAGTGGGATC GAGGTCACCG CTTACATTCC GGGTGAGGGG 240 CACAACCTGC AGGAGCACTC CATCGTGCTC GTGCGCGGCG GCCGTGTGAA GGACCTGCCG 300 GGTGTTCGCT ACAAGATCAT CCGCGGTTCG CTTGACACCC AGGGTGTGAA GAACCGCAAG 360 CAGGCCCGCA GCCGCTACGG CGCCAAGAAG GAGAAGTAAG AATGCCTCGT AAGGGCCCCG 420 CCCCGAAGCG CCCGGTCATC ATCGACCCGG TCTACGGTTC TCCTCTGGTG ACCTCCC 477//

EMBLFlat File

Page 34: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

EMBL Relational Schema

VARIATIONFEATURE @ PRD# * FEATID * FKEY# o FREQUENCY o REPLACE * USERSTAMP * TIMESTAMP

UNCLASSIFIEDFEATURE @ PR# * FEATID * FKEY# o USERSTAMP o TIMESTAMP

TRANSLATIONEXCEPTION @ PRDB1 * AMINOACID * FEATID * TRANSXEND# * TRANSXID * TRANSXSTART o BIOSEQID * USERSTAMP * TIMESTAMP

TRANSCRIPTFEATURE # * FEATID * FKEY# o ENUMBER o FIVE_CONS o PSEUDO o READINGFRAME o THREE_CONS * USERSTAMP * TIMESTAMP

THESIS @ PRDB1 (#) * INSTITUTE## * PUBID o ADVISOR o DEGREE * USERSTAMP * TIMESTAMP

SUBMISSIONREF @ PRDB1 (#)# * PUBID o DS# o MEDIUM * USERSTAMP * TIMESTAMP

SOURCEFEATURE @ PRDB1 (#)# * FEATID * FKEY# o ORGANISM o CHROMOSOME o FREQUENCY o HAPLOTYPE o NUCSOURCE o ORGANELLE o SEQUENCED o SEX o VIRION * USERSTAMP * TIMESTAMP

SIGNALFEATURE @ PRDB# * FEATID * FKEY# o DIRECTION o PSEUDO * USERSTAMP * TIMESTAMP

SEQFEATURE_NOTE @ PRDB1 * FEATID * LINE# * NOTE# * TEXT o USERSTAMP o TIMESTAMP

SEQFEATURE @ PRDB1 (#) * BIOSEQID# * FEATID * FTYPE# * LOCATION * ORDER_IN o EVIDENCE o FEAT_LABEL o INCOMPLETE o SUBMITTOR * USERSTAMP * TIMESTAMP

RPTUNIT @ PRDB * FEATID o RPTID o LABEL o RPTEND o RPTSTART * USERSTAMP * TIMESTAMP

RNAFEATURE @ PRDB1 (#)# * FEATID * FKEY# o ANTICODONAA o ANTICODONEND o ANTICODONSTART o PSEUDO * USERSTAMP * TIMESTAMP

PUB_XREF @ PRDB1 * PUBID * DBCODE * PRIMARYID

PUBLICATION @ PRDB1 (#)# * PUBID * PUBLANG * PUBSTATUS * PUBTYPE o PUBDATE o TITLE * USERSTAMP * TIMESTAMP

PUBAUTHOR @ PRDB * ORDERIN * PERSON * PUBID o EDITORFLAG * USERSTAMP * TIMESTAMP

PROTEINSEQ @ PRDB1 (## * SEQID o DERIVED o MOLWEIGHT * USERSTAMP * TIMESTAMP

PROTEINCODINGFEATURE @ PR# * FEATID * FKEY# * PROTEINSEQID o ENUMBER o PSEUDO o READINGFRAME o TRANSL_TABLE_ID * USERSTAMP * TIMESTAMP

PHYSICALSEQ @ PRDB1 (#)# * PHYSEQID * SEQTEXT * USERSTAMP * TIMESTAMP

PERSONALCOMM @ PRDB1 * ADDRESS * PUBID * RECIPIENT o PHONEID * USERSTAMP * TIMESTAMP

PERSON @ PRDB1 (#)# * PERSONID * SURNAME o FIRSTNAME o MIDINITIALS * USERSTAMP * TIMESTAMP

PATENT_BIOSEQ @ PRDB1 (# * ORDERIN# * PUBID# * SEQID * USERSTAMP * TIMESTAMP

PATENTPRIORITY @ P * PRIORITY_DATE * PRIORITY_NO * PRIORITY_OFFICE * PRIORITY_ORDER * PUBID * USERSTAMP * TIMESTAMP

PATENTCLASS @ PR * CLASS * CLASS_ORDER * PUBID * USERSTAMP * TIMESTAMP

PATENTAPPLICANT * APPNAME * ORDERIN * PUBID * USERSTAMP * TIMESTAMP

PATENTABSTRACT @ P * ABSTRACT# * PUBID * USERSTAMP * TIMESTAMP

PATENT @ PRDB1 (#) * DOCNUM * DOCOFFICE * DOCTYPE# * PUBID o APPDATE o APPNUM o APPOFFICE * USERSTAMP * TIMESTAMP

NUCSTRUCTUREFEATURE @ PRD# * FEATID * FKEY# o ENUMBER o FREQUENCY o MODBASE o ORGANISM o RPTFAMILY * USERSTAMP * TIMESTAMP

NUCSEQ @ PRDB1 (#) * A_COUNT * C_COUNT * G_COUNT * OTHER_COUNT# * SEQID * T_COUNT o STRAND o TOPOLOGY * USERSTAMP * TIMESTAMP

NTX_TAX_NODE @ PRDB1 (#)# * TAX_ID o PARENT_ID o RANK o EMBL_CODE o DIV_ID * INHERIT_DIV_ID o GC_ID o INHERIT_GC_ID o MGC_ID o INHERIT_MGC_ID * HIDDEN o NO_SEQUENCE o REMARK

NTX_SYNONYM @ PRDB1 * TAX_ID * NAME_TXT o UNIQUE_NAME o NAME_CLASS * UPPER_NAME_TXT

NTX_RANK @ PRDB o RANK_ID# * RANK_TXT

NTX_CLASS @ PRDB1 o CLASS_CODE# * CLASS_TEXT o PRIORITY

LOCATION_TREE @ PRDB1 (#)# * LOCNODEID * ORDER_IN o COMPLEMENT o END_FSIZE o END_FTYPE o GAP_FSIZE o GAP_FTYPE o LITERAL o OPERATOR o PARENTID o REPLACE o REPL_STRING o SEG_END o SEG_GAP o SEG_START * SEQID o START_FSIZE o START_FTYPE * USERSTAMP * TIMESTAMP

KEYWORD_SYNONYM @ P# * KEYWORDID1# * KEYWORDID2

KEYWORD @ PRD * COMPRESSED_KW * KEYWORD# * KEYWORDID o DBCODE o DESCRIPTION * USERSTAMP * TIMESTAMP

JOURNALARTICLE @ PRDB1 (# * FIRSTPAGE * ISSN# * PUBID * VOLUME o ARTICLETYPE o ISSUE o LASTPAGE o ORDER_ON_PAGE o SUPPLEMENT * USERSTAMP * TIMESTAMP

INSTITUTE @ PRDB# * INSTITUTE# * INSTITUTE_NAME * USERSTAMP * TIMESTAMP

IMMUNOFEATURE @ PRDB# * FEATID * FKEY# o CHIMERIC o PSEUDO o READINGFRAME o TRANSL_TABLE_ID * USERSTAMP * TIMESTAMP

GENE @ PRDB1 (# * GENEID * GENENAME o DBCODE o EXTDBID o ORGANISM o PATHOLOGY o PRODUCT * USERSTAMP * TIMESTAMP

FEATURE_RELATIONSHIP @ PRDB1 (#)# * FEATID1# * FEATID2# * RELATION * USERSTAMP * TIMESTAMP

FEATURE_QUALIFIERS @ PRDB1 (#)# * FEATID# * ORDER_ON * QUAL# o TEXT o USERSTAMP o TIMESTAMP

ERROR_QUALIFIERS @ PRDB1 (## * FEATID# * ORDER_ON

DBENTRY_KEYWORD @ PR * KEYWORDID * DBENTRYID o USERSTAMP o TIMESTAMP

DBENTRY_DESCR @ PRD# * DBENTRYID# * LINE# * TEXT o USERSTAMP o TIMESTAMP

DBENTRY_COMMENT @ PR# * COMMENTID# * DBENTRYID * USERSTAMP * TIMESTAMP

DBENTRY @ PRDB1 (#) * BIOSEQID# * DBENTRYID * ENTRY_NAME * ENTRY_STATUS * PRIMARYACC# * VERSION# o ANN_DATE o CLEAN_LISTING o CONFIDENTIAL o DBCODE o EXT_DATE o EXT_VER o FIRST_CREATED o FIRST_PUBLIC o HOLD_DATE o MISSING_PAPER o PROJECT# o SUBMIT_TOOL o WAIT_FOR_PAPER * USERSTAMP * TIMESTAMP o FFDATE

DATABASE_XREF @ PRD * EBI_DB * ACC# o NID_TEXT o PID_TEXT * EXT_DB * PRIMARYID o SECONDARYID

COMMENT_TEXT @ PRD# * COMMENTID# * LINE# * TOPICTYPE o PRIVATE o TEXT...

CODONEXCEPTION @ PRD * AMINOACID * CODONSEQ# * CODONXID * FEATID o USERSTAMP o TIMESTAMP

CITATIONSEQFEATURE @ PRDB1 (#)# * FEATID# * PUBID# * SEQID * USERSTAMP * TIMESTAMP

CITATIONBIOSEQ @ PRDB1 (# * ORDERIN# * PUBID# * SEQID o CITCOMMENT o FULLSEQ o LOCNODEID o LOCTYPE * USERSTAMP * TIMESTAMP

BOOK @ PRDB1 (#) * BOOKTITLE * FIRSTPAGE * LASTPAGE# * PUBID * PUBLISHER o EDITION o ISBN o PUBPLACE o SERIES o VOLUME * USERSTAMP * TIMESTAMP

BIOSEQ @ PRDB1 (#) * BIOSEQTYPE * CHKSUM * MOLECULETYPE# * SEQID * SEQLEN o DDBJSID o EBISID o LOGSEQ o NCBIGI o PHYSEQ * USERSTAMP * TIMESTAMP

ACCPAIR @ PRDB1 (#)# * PRIMARY# * SECONDARY * USERSTAMP * TIMESTAMP

ACCEPTED @ PRDB1 (#) * ISSN# * PUBID o ARTICLETYPE o FIRSTPAGE o ISSUE o LASTPAGE o ORDER_ON_PAGE o SUPPLEMENT o VOLUME * USERSTAMP * TIMESTAMP

FK_ACCFK_ACCPAIR_127

FK_BIOSEQ_4

FK_BIOSEQ_5

FK_BOOK_59

FK_CITATIONBIOSEQ_60

FK_CITATIONBIOSEQ_61

FK_CITATIONSEQFEATURE_10

FK_CITATIONSEQFEATURE_92

FK_CODONEXCEPTION_50

FK_DBENTRY_25

FK_DBENTRY_COMMENT_100

FK_DBENTRY_DESCR_117

FK_DBENTRY_KEYWORD_39

FK_DBENTRY_KEYWORD_40

FK_ERROR_QUALIFIERS_122

FK_FEATURE_QUALIFIERS_121

FK_FEATURE_RELATIONSHIP_28FK_FEATURE_RELATIONSHIP_29

FK_GENE_64

FK_IMMUNOFEATURE_7

FK_JOURNALARTICLE_66

FK_LOCATION_TREE_68

FK_LOCATION_TREE_72

FK_NAME_CLASS

FK_NTX_SYNONYM_46

FK_NUCSEQ_76

FK_NUCSTRUCTUREFEATURE_55

FK_NUCSTRUCTUREFEATURE_57

FK_PATENTABSTRACT_1FK_PATENTAPPLICANT_2

FK_PATENTCLASS_98FK_PATENTPRIORITY_97

FK_PATENT_77

FK_PATENT_BIOSEQ_107

FK_PATENT_BIOSEQ_108

FK_PERSONALCOMM_3

FK_PROTEINCODINGFEATURE_12

FK_PROTEINCODINGFEATURE_13

FK_PROTEINSEQ_80

FK_PROTEINSEQ_81

FK_PUBAUTHOR_82

FK_PUBAUTHOR_83FK_PUB_XREF_1

FK_RANK

FK_RNAFEATURE_15

FK_RPTUNIT_109

FK_SEQFEATURE_35

FK_SEQFEATURE_36

FK_SEQFEATURE_NOTE_78

FK_SIGNALFEATURE_18

FK_SOURCEFEATURE_20FK_SOURCEFEATURE_21

FK_SUBMISSIONREF_119

FK_THESIS_106

FK_THESIS_84

FK_THESIS_85

FK_TRANSCRIPTFEATURE_52

FK_TRANSLATIONEXCEPTION_110

FK_TRANSLATIONEXCEPTION_51

FK_UNCLASSIFIEDFEATURE_118

FK_UNPUBLISHED_114

FK_VARIATIONFEATURE_48

Location Info

Feature Info

Taxonomy Info

Reference InfoSequence Info

Page 35: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Data Access and UseNetwork servicesSequence Retrieval System (SRS)integrating and linking the main nucleotide and protein databases plus many specialized databasesDatabase releases are produced quarterly- via FTP (inc. mirror sites) and CD-ROMDaily and cumulative updates via FTPSequence search servers

Page 36: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

April 2003: TrEMBL 23.4 + SWISS-PROT 41.2

829,111 TrEMBL entries 123,721 SWISS-PROT entriesweekly production of a non-redundant and comprehensive protein sequence database consisting of SWISS-PROT, TrEMBL, and TrEMBLnew: ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb/

Page 37: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

GoalsHigh level of annotationMinimal redundancyHigh level of integration with other databasesComplete and up-to-dateAvailability

Page 38: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Growth of TrEMBL and SWISS-PROT

0

100

200

300

400

500

600

700

800

900

Nov-96 May-97 Nov-97 May-98 Nov-98 May-99 Nov-99 May-00 Nov-00 May-01 Nov-01 May-02

Publication Date

Entr

ies

in 1

000

SWISS-PROT TrEMBL

Page 39: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Automatic annotation of TrEMBL

Data-mining to extract conditions from InterProExtract SWISS-PROT reference entries fulfilling the conditionsExtract common annotationStore conditions and common annotation in RuleBaseGroup TrEMBL by conditionsAdd common annotation to TrEMBL

TrEMBLTrEMBL

InterProInterPro

RuleBaseRuleBase

SWISS-PROT

Page 40: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Cross-referencesD om ains, fun ctio nalsites, p rotein fam iliesInterP ro

P R O S ITEP fam

P R IN TS

P roD omS M A R T

N ucleotide sequence dbE M B L , G enB an k, D D B J

3D /S tructu ral d bsH SS P

P D B

O rganism -spec.

dbsD ictyD bE coG en e

FlyB aseH IV

Lep rom a

M aizeD BM G D

M ypuL istS G D

S tyG en e

S ub tiL istTIG R

Tu bercuL istW orm P ep

Y E P D

Zfin

P ro tein -specificdbsG C R D b

M E R O P SR EB A S E

TR A N S FA C

SW ISS -PR O T/

TrE M BL2D -gel p rotein dbsS W IS S -2D P A G E

A N U -2D P A G E

C O M P L U Y E A S T-2D P A G EE C O 2D B A S E

H SC -2D P A G EA arh us and G hent

M A IZ E -2D P A G E

P H C I-2D P A G EP M M A -2D P A G E

S iena-2D P A G E

H um an diseasesM IM

P TMC arb B an k

G lycoS u iteD B

Page 41: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

TrEMBL

UniProt Archive

EnsEMBL PDB PatentData

DDBJ/EMBL/

GenBankPIR

UniProt Knowledgebase:TrEMBL + SWISS-PROT

UniProt NREF100

UniProt NREF90

UniProtNREF50

SWISS-PROT

OtherData…

Classification

Automated Annotation Literature Based Annotation

RefSeq

Page 42: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

Funding

EMBLEuropean CommissionNIHIndustrial licensesMRCIUPHAR

Page 43: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997
Page 44: Value-adding, Access, and Use: Biological Databases as a ... · ID SLD746 standard; DNA; PRO; 477 BP. XX AC D83746; XX NI g1772347 XX DT 18-JAN-1997 (Rel. 50, Created) DT 17-FEB-1997

SWISS-PROT, TrEMBL, InterPro, etc, at EBI and SIB •Group leaders: Rolf Apweiler, Amos Bairoch•Co-ordinators:Wolfgang Fleischmann, Henning Hermjakob, Michele Magrane, Maria-Jesus Martin, Nicola Mulder, Claire O’Donovan, Manuela Pruess•Annotators/curators: Philippe Aldebert, Andrea Auchincloss, Kirsty Bates, Marie-Claude Blatter Garin, Brigitte Boeckmann, Silvia Braconi Quintaj, Paul Browne, Evelyn Camon, Danielle Coral, Elisabeth Coudert, Tania de Oliveria Lima, Kirill Degtyarenko, Sylvie Dethiollaz, Ann Estreicher, Livia Famiglietti, Nathalie Farriol-Mathis, Stephanie Federico, Serenella Ferro, Gill Fraser, Raffaella Gatto, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Vivien Junker, Youla Karavidopoulou, Maria Krestyaninova, Kati Laiho, Minna Lehvaslaiho, Karine Michoud, Virginie Mittard, Madelaine Moinat, Sandra Orchard, Sandrine Pilbout, Sylvain Poux, Sorogini Reynaud, Catherine Rivoire, Bernd Röchert, Michel Schneider, Christian Sigrist, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Sandra van den Broek, Bob Vaughan, Eleanor Whitfield•Programmers: Daniel Barrell, David Binns, Michael Darsow, Ujjwal Das, Eduardo de Castro, Alexander Fedotov, Astrid Fleischmann, Elisabeth Gasteiger, Alain Gateau, Andre Hackmann, Ivan Ivanyi, Eric Jain, Alexander Kanapin, Paul Kersey, Ernst Kretschmann, Corinne Lachaize, Chris Lewington, Xavier Martin, John Maslen, Peter McLaren, Rupinder Singh Mazara, Lorna Morris, John O’Rourke, Isabelle Phan, Astrid Rakow, Kai Runte, Florence Servant, Allyson Williams, Dan Wu•Research staff: Kristian Axelsen, Pierre-Alain Binz, Nicolas Hulo, Anne-Lise Veuthey•Clerical/secretarial assistance: Veronique Mangold, Claudia Sapsezian, Margaret Shore-Nye, Veronique Verbegue•Students: Pavel Dobrokhotov, Alexandre Gattiker, various MCF, etc


Recommended