Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | oswin-hutchinson |
View: | 222 times |
Download: | 3 times |
12 November 2003 Michael Ley 1
.uni-trier.dedb
lp
Computer ScienceDigital Libraries
a personal view
JBIDI 2002
Alicante, Spain
12 November 2003 Michael Ley 2
.uni-trier.dedb
lp
Thank you for the invitation
alternative title of this talk:
An Amateur‘s Computer Science
Digital Library
12 November 2003 Michael Ley 3
.uni-trier.dedb
lp
About me …
• born 1959
• home: bookstore (small family business)
• 1986 diploma in informatics (Aachen)
• 1993 Ph.D. (Trier)
• since 1993 lecturer at University of Trier– Programming for 1st/2nd year students– DB implementation, Digital Libraries
12 November 2003 Michael Ley 4
.uni-trier.dedb
lp
ComputerScience
Information &LibraryScience
DLs&
JBiDi
12 November 2003 Michael Ley 5
.uni-trier.dedb
lp
Why is DBLP„An Amateur‘s CS DL“ ?
• no formal degree in library management
• unprofessional methods of software development & use
nevertheless DBLP seems to usefull …
12 November 2003 Michael Ley 6
.uni-trier.dedb
lp
Outline
1. History of DBLP
2. Computer Science Publications & DLs
3. Technical Background
4. Demo
12 November 2003 Michael Ley 7
.uni-trier.dedb
lp
Outline
1. History of DBLP
2. Computer Science Publications & DLs
3. Technical Background
4. Demo
12 November 2003 Michael Ley 8
.uni-trier.dedb
lp
[Atzeni et al. VLDB 1997]
12 November 2003 Michael Ley 9
.uni-trier.dedb
lp
The Beginning: End of 1993
• Simple test of Web technology: Xmosaic, NCSA HTTP server
• Tables of contents:– Journals and proceedings– DataBase systems / Logic Programming
12 November 2003 Michael Ley 10
.uni-trier.dedb
lp
VLDB‘93 VLDB‘92 VLDB‘90 TKDE5 TKDE4 TKDE3 …
Tables of Contents
VLDB TKDEStreams ……
Conferences Journals
Entry Pages
„Home“
Basic Architecture
12 November 2003 Michael Ley 11
.uni-trier.dedb
lp
Person-Publication Network
VLDB‘93 VLDB‘92 VLDB‘90 TKDE5 TKDE4 TKDE3 …
Tables of Contents
J.D.Ullman H.-J.Schek S.Ceri
Person Pages
Person Index Person Search
…
12 November 2003 Michael Ley 12
.uni-trier.dedb
lp
Early Recognition
• 1997:– ACM SIGMOD Service Award– VLDB Endowment Special Recognition
Award
• Helped to make DBLP a more „official“ project & to get a small inital fund
12 November 2003 Michael Ley 13
.uni-trier.dedb
lp
ACM SIGMOD Anthology
• Idea by Rick Snodgrass (SIGMOD chair 1997-2001):– scan in „historical“ DB publications– Combine these full texts and an improved
version of DBLP to a CDROM-based digital libary
12 November 2003 Michael Ley 14
.uni-trier.dedb
lp
12 November 2003 Michael Ley 15
.uni-trier.dedb
lp
Anthology: Contents
Journals, Newsletters:• TODS• TKDE• VLDB Journal• Distr. & Parallel DB• Data Engineering
• SIGMOD Record• SIGKDD Expl.• SIGIR Forum• Data Base
12 November 2003 Michael Ley 16
.uni-trier.dedb
lp
Proceedings:• ACM DL• ACM GIS• ADBIS• CIKM• COOPIS• DASFAA• DBPL• DOLAP
• EDBT• ER• Hypertext• ICDT• KRDB• MFDBS• NPIV• PDIS• PODS
• POS• SIGIR• SIGMOD• SIGFIDET• SSD• SSDBM• VLDB• XP
+ several Workshops
12 November 2003 Michael Ley 17
.uni-trier.dedb
lp
Books:
• Abiteboul/Hull/Vianu: Foundations of DBs
• Bernstein/Hadzilacos/Goodman: Concurrency Control and Recovery in Database Systems
• Maier: Theory of Relational Databases• Gray: The Benchmark Handbook• Stonebraker: The INGRES Papers• Wiederhold: Database Design (2nd Ed.)• Snodgrass: The TSQL2 Temporal Query L.
12 November 2003 Michael Ley 18
.uni-trier.dedb
lp 21 CDROMs
>150000pages full text
12 November 2003 Michael Ley 19
.uni-trier.dedb
lp
size # files # PDF files
DVD 1 7.5 G 10339 10025
DVD 2 6.31 G 201902 4284
12 November 2003 Michael Ley 20
.uni-trier.dedb
lp
Anthology: Citation Links
References:[1] …[2] B: xxx.[3] ……
References:[1] ……Referenced by:• A: yyy.• …
12 November 2003 Michael Ley 21
.uni-trier.dedb
lp
…
…
12 November 2003 Michael Ley 22
.uni-trier.dedb
lp ACM SIGMODDigital Symposium Collection (DiSC):Recent Material (4 CDROMs, 1 DVD)edited by Isabel Cruz (1999-2001), Aidong Zhang (2002), …
ACM SIGMOD Contributions Award 2003
12 November 2003 Michael Ley 23
.uni-trier.dedb
lp
Economic Models for Networked Information
[William Y. Arms, iMP, March 2000]
• Restricted access: use-based payment
• Restricted access: subscriptions
• Open access: advertising
• Open access: external funding DBLP
12 November 2003 Michael Ley 24
.uni-trier.dedb
lp
Sponsor Found / Expansion• help by MS Research & Jim Gray made it
possible to expand DBLP to cover most areas of computer science
• students were hired to enter data
12 November 2003 Michael Ley 25
.uni-trier.dedb
lp
Other Sponsors / Supporters
• Max-Planck-Institut für Informatik (Saarbrücken): Students / Library
• University of Trier• VLDB Endowment• IFIP• CSREA• Dagstuhl Library
12 November 2003 Michael Ley 26
.uni-trier.dedb
lp
New Project:
End of 2002 – September 2005
12 November 2003 Michael Ley 27
.uni-trier.dedb
lp
CCSB(Achilles)
CompuScience
LeaBib(Mayr)
DBLP
12 November 2003 Michael Ley 28
.uni-trier.dedb
lp
Service Research
Semantic Methods andTools for Information Portals
12 November 2003 Michael Ley 29
.uni-trier.dedb
lp
• FIZ Karlsruhe• GI / Uni Frankfurt• AIFB, Uni Karlsruhe• TU München• Uni Trier
• AIFB, Uni Karlsruhe• DFKI Saarbrücken• FhG IPSI, Darmstadt• Uni Trier
12 November 2003 Michael Ley 30
.uni-trier.dedb
lp
Our Group (Nov.2003)
• Prof. Dr. Bernd Walter• Dr. Michael Ley• Alexander Weber• Emma Rabbidge• Stephan Klink• Patrick Reuther
• B. Weiland (Secretary)• Marion Sievers• Doris Holzträger• Dietrich Arlart• Dongxin Yi
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
Jan
96
Mai
96
Sep 9
6
Jan
97
Mai
97
Sep 9
7
Jan
98
Mai
98
Sep 9
8
Jan
99
Mai
99
Sep 9
9
Jan
00
Mai
00
Sep 0
0
Jan
01
Mai
01
Sep 0
1
Jan
02
Mai
02
Sep 0
2
Jan
03
Mai
03
Sep 0
3
DBLP: Number of Bibliographic Records
12 November 2003 Michael Ley 32
.uni-trier.dedb
lp
0
5000
10000
15000
20000
25000
30000
35000
40000n
um
ber
of
entr
ies
19
45
19
49
19
53
19
57
19
61
19
65
19
69
19
73
19
77
19
81
19
85
19
89
19
93
19
97
20
01
year
Number of records by publication year(1 November 2003)
12 November 2003 Michael Ley 33
.uni-trier.dedb
lp
Outline
1. History of DBLP
2. Computer Science Publications & DLs
3. Technical Background
4. Demo
12 November 2003 Michael Ley 34
.uni-trier.dedb
lp
Main Players (1)• Learned societies:
– ACM– IEEE(-CS),– many small societies
• Commercial publishers:– Elsevier group (North-Holland, Pergamon,
Academic Press, Morgan Kaufmann, …)– Springer/Kluwer– …
12 November 2003 Michael Ley 35
.uni-trier.dedb
lp
Main Players (2)• Open access online publications:
– CoRR, NCSTRL, …– CEUR, … – ECCC, …
• Abstracting & Indexing services:– INSPEC, ACM Portal, CompuScience,…– Achilles(CCSB), CiteSeer, DBLP, LeaBib, HCI, …– Google et al.
(-) insufficient information about the volume: no editors, no ISBN, …
(+) Session Titles
Page Number
DOI
12 November 2003 Michael Ley 43
.uni-trier.dedb
lp
12 November 2003 Michael Ley 44
.uni-trier.dedb
lp
12 November 2003 Michael Ley 45
.uni-trier.dedb
lp
no easy navigationa certain volume
12 November 2003 Michael Ley 46
.uni-trier.dedb
lp
insufficient information:no editors, no volume title
no session titles
12 November 2003 Michael Ley 47
.uni-trier.dedb
lp
12 November 2003 Michael Ley 54
.uni-trier.dedb
lp
12 November 2003 Michael Ley 55
.uni-trier.dedb
lp
12 November 2003 Michael Ley 56
.uni-trier.dedb
lp
12 November 2003 Michael Ley 57
.uni-trier.dedb
lp
12 November 2003 Michael Ley 58
.uni-trier.dedb
lp
12 November 2003 Michael Ley 59
.uni-trier.dedb
lp
12 November 2003 Michael Ley 60
.uni-trier.dedb
lp
12 November 2003 Michael Ley 61
.uni-trier.dedb
lp Steve Lawrence: Online or invisible?Nature 411(6837): 521 (2001)
Articles freely available online are more highly cited.
combinationof data fromCiteSeer andDBLP
12 November 2003 Michael Ley 62
.uni-trier.dedb
lp
What is „special“ with Computer Science ?
• Conferences & proceedings are more important than journals (in many subfields)
• More high quality material is available on the (open) Web, e.g. on personal home pages
• There is no abstract/index database used by most computer scientists
12 November 2003 Michael Ley 63
.uni-trier.dedb
lp
Although bibliographic databases such as INSPEC, ABI/INFORM and MLA Bibliography indexed the reported journals of choice for researchers in computer science and literary theory, the researchers did not use them.
[Covi, IPM 1999]
12 November 2003 Michael Ley 64
.uni-trier.dedb
lp
→ DBLP: Person Pages
12 November 2003 Michael Ley 65
.uni-trier.dedb
lp
12 November 2003 Michael Ley 66
.uni-trier.dedb
lp
Person Names
• widely accepted method to identify persons
• names may not be unique
• a person may change her/his name (marriage, emigration to other cultural environment)
• variations of person names …
12 November 2003 Michael Ley 67
.uni-trier.dedb
lp
• abbreviations: Jeffrey D. Ullman, J. D. Ullman, J. Ullman, Jeff Ullman, …
• nicknames: Michael / Mike, William / Bill, Joseph / Joe
• permutations: Liu Bin / Bin Liu
• different transcriptions: Andrei / Andrej / Andrey
• accents: Stephane / Stéphane
• umlauts: Muller / Müller / Mueller
12 November 2003 Michael Ley 68
.uni-trier.dedb
lp
• ligatures: Weiß / Weiss, Åström / Aastrom
• case: Al-A’Ali / Al-A’ali
• hyphens: Hans-Peter / Hans Peter
• composition: MaoLin / Mao Lin, Kenichi / Ken-ichi / Ken’ichi
• postfixes: Karel Culik II, Jr. / Sr.
• typos
12 November 2003 Michael Ley 69
.uni-trier.dedb
lp
Person Names: Problems
• there should be a 1:1 mapping between persons and person pages
• how to search persons best ?
• how to normalize different spellings ?
name normalization costs morethan 60% of our time …
12 November 2003 Michael Ley 70
.uni-trier.dedb
lp
Person Name Normalization
• for each new entry we try to locate the authors/editors in the existing collection
• if spellings differ, but we are confident that they are variations of the same person‘s name → make them equal
• write out most parts of the name• person‘s perferred spelling ?
12 November 2003 Michael Ley 71
.uni-trier.dedb
lp
• for persons with many publications the name spelling usually converges to a stable & correct state
• for persons with a few known publications it is more likely that there are duplicate person pages, incorrect or incomplete spellings
12 November 2003 Michael Ley 72
.uni-trier.dedb
lp
Heuristics in the decision process • coauthor relationship gives strong
indications for the identity of persons
• streams (journals/conference series): condensation points for communities, weaker indication
• same keywords in titles
• time frame ?
• a lot of background knowledge …
12 November 2003 Michael Ley 73
.uni-trier.dedb
lp
new entry:Marcos Flôres Ferrão
Example …
correct the old entry
12 November 2003 Michael Ley 74
.uni-trier.dedb
lp
…for library people this is an old idea:„controlled vocabularies“ /„authority control“
12 November 2003 Michael Ley 75
.uni-trier.dedb
lp
12 November 2003 Michael Ley 76
.uni-trier.dedb
lp
12 November 2003 Michael Ley 77
.uni-trier.dedb
lp
12 November 2003 Michael Ley 78
.uni-trier.dedb
lp
12 November 2003 Michael Ley 79
.uni-trier.dedb
lp
12 November 2003 Michael Ley 80
.uni-trier.dedb
lp
12 November 2003 Michael Ley 81
.uni-trier.dedb
lp
12 November 2003 Michael Ley 82
.uni-trier.dedb
lp
Classification
• ACM CR System– classifications only available for small
subset of computer science literature– often criticized as too inflexible / too coarse
• intellectual classification (on the paper level) is too expensive
12 November 2003 Michael Ley 83
.uni-trier.dedb
lp
Most bibliographic databases use keywords from controlled vocabularies and/or map publications into a thesauraus/onthology.
In DBLP we try concentrate on authority control for person names and a comprehensibe representation of publication streams
12 November 2003 Michael Ley 84
.uni-trier.dedb
lp
Outline
1. History of DBLP
2. Computer Science Publications & DLs
3. Technical Background
4. Demo
12 November 2003 Michael Ley 85
.uni-trier.dedb
lp
Initial Design
• Entry pages
• Collection of HTML tables of contents
• TOCs were parsed to generate „Person Pages“ (customized xmosaic parser)
• Person Index
12 November 2003 Michael Ley 86
.uni-trier.dedb
lp
TOC Parser / Generation of Person Pages
Parser TOC_OUT mkauthors
TOCs
PersonPages
12 November 2003 Michael Ley 87
.uni-trier.dedb
lp
…
mkauthors
AUTHORS author
CGI
12 November 2003 Michael Ley 88
.uni-trier.dedb
lp
TOC_OUT title
CGI
12 November 2003 Michael Ley 89
.uni-trier.dedb
lp
Bibliographic Records
better search engine, citation linking, reviews, annotated bibliographies, …
• assign an unique ID to each publication
• make it accessible by this ID
• store the information in classical bibliograhic records
12 November 2003 Michael Ley 90
.uni-trier.dedb
lp
Bibliographic Records
• Idea: BibTeX++ in XML syntax
• Simple DTD
• You may download them from
http://dblp.uni-trier.de/xml/
uncompressed ~182MByte(Nov 2003 )
12 November 2003 Michael Ley 91
.uni-trier.dedb
lp
<inproceedings mdate="2003-10-26" key="conf/jbidi/CollC02"><author>Imma Subirats Coll</author><author>José Manuel Barrueco Cruz</author><title>ReLIS: una biblioteca digital distribuida para Documentación.</title><year>2002</year><crossref>conf/jbidi/2002</crossref><booktitle>JBIDI</booktitle><ee>http://mariachi.dsic.upv.es/jbidi/jbidi2002/Camera-ready/ Sesion4/S4-2.pdf</ee><url>db/conf/jbidi/jbidi2002.html#CollC02</url></inproceedings>
date of last modification
key
key of proceedings record
electronic edition
URL of TOC
12 November 2003 Michael Ley 92
.uni-trier.dedb
lp
<ee> , <url>
• <ee> contains a link to an abstract and/or full text
• for <inproceedings>, <article> and <incollection> the <ee>-Field points into the table of contents of the volume
• <ee> and <url> may contain absolute links (start with http:// or ftp://) or local links (you
have to add a DBLP prefix)
12 November 2003 Michael Ley 93
.uni-trier.dedb
lp
<proceedings mdate="2003-10-26" key="conf/jbidi/2002"><title>Actas de las III Jornadas de Bibliotecas Digitales, El Escorial (Madrid), 18-19 de noviembre de 2002</title><booktitle>JBIDI</booktitle><year>2002</year><url>db/conf/jbidi/jbidi2002.html</url></proceedings> URL of TOC
12 November 2003 Michael Ley 94
.uni-trier.dedb
lp
<article mdate="2002-01-26" key="journals/wi/OberweisS91">
<author>Andreas Oberweis</author><author>Wolffried Stucky</author><title>Die Behandlung von Ausnahmen in Software-Systemen: Eine Literaturübersicht.</title><pages>492-502</pages><year>1991</year><volume>33</volume><journal>Wirtschaftsinformatik</journal><number>6</number><url>db/journals/wi/wi33.html#OberweisS91</url></article>
a journal article
12 November 2003 Michael Ley 95
.uni-trier.dedb
lp
<inproceedings mdate="2002-01-02" key="conf/er/JaeschkeOS93"><author>Peter Jaeschke</author><author>Andreas Oberweis</author><author>Wolffried Stucky</author><title>Extending ER Model Clustering by Relationship Clustering.</title><pages>451-462</pages><year>1993</year><booktitle>ER</booktitle><url>db/conf/er/er93.html#JaeschkeOS93</url><crossref>conf/er/93</crossref><cdrom>er93/ER93-P447.pdf</cdrom><ee>db/conf/er/JaeschkeOS93.html</ee>
only used for the
ACM SIGMOD Anthology
12 November 2003 Michael Ley 96
.uni-trier.dedb
lp
<cite label="CaJA89">conf/er/CarlsonJA89</cite><cite label="Chen76">journals/tods/Chen76</cite><cite label="FeMi86">journals/cj/FeldmanM86</cite><cite label="Mart89">...</cite><cite label="Mist91">...</cite><cite label="RaSt92">conf/er/RauhS92</cite><cite label="ScSt83">...</cite><cite label="ScSW79">conf/er/ScheuermannSW79</cite><cite label="TeYF86">journals/csur/TeoreyYF86</cite><cite label="TWBK89">journals/cacm/TeoreyWBK89</cite></inproceedings>
citation links
12 November 2003 Michael Ley 97
.uni-trier.dedb
lp
<proceedings mdate="2002-01-02" key="conf/er/93"><editor>Ramez Elmasri</editor><editor>Vram Kouramajian</editor><editor>Bernhard Thalheim</editor><title>Entity-Relationship Approach - ER'93, 12th International Conference on the Entity-Relationship Approach, Arlington, Texas, USA, December 15-17, 1993, Proceedings</title><booktitle>ER</booktitle><series href="db/journals/lncs.html">Lecture Notes in Computer Science</series><volume>823</volume><publisher>Springer</publisher><year>1994</year><isbn>3-540-58217-7</isbn><url>db/conf/er/er93.html</url></proceedings>
12 November 2003 Michael Ley 98
.uni-trier.dedb
lp
XML Parser / Generation of Person Pages
Parser TOC_OUT mkauthors
XML-Records
PersonPages
12 November 2003 Michael Ley 99
.uni-trier.dedb
lp
„Advanced Search“: MG
• „Managing Gigabytes“ Software by Witten, Moffat, Bell
• DBLP XML Records →MG Documents
• Filter: matching terms in the required field
12 November 2003 Michael Ley 100
.uni-trier.dedb
lp
12 November 2003 Michael Ley 101
.uni-trier.dedb
lp
12 November 2003 Michael Ley 102
.uni-trier.dedb
lp
please use the XML recordsto build your own services …
example:
Bibshare – an environment for bibliographic
managementby José H. Canos, Valencia
12 November 2003 Michael Ley 103
.uni-trier.dedb
lp
client layer MS Word Emacs …
collectionlayer
DBLP-API …-API BibShare-API
Bibshare Search Enginefederatedsearch layer
12 November 2003 Michael Ley 104
.uni-trier.dedb
lp
12 November 2003 Michael Ley 105
.uni-trier.dedb
lp
12 November 2003 Michael Ley 106
.uni-trier.dedb
lp
12 November 2003 Michael Ley 107
.uni-trier.dedb
lp
BibTeX(xml)
BibTeX(xml)
BibTeX(xml)
BibTeX(xml)
BHT
…
HTML
Tables of Contents, …
12 November 2003 Michael Ley 108
.uni-trier.dedb
lp
„Bibliography HyperText“ (BHT)
• include mechanism (<cite key='…' style='…'>)
• several additonal tags: <logo>, <footer>, <ref href="…">, …
• HTML - subset
12 November 2003 Michael Ley 109
.uni-trier.dedb
lp
<html><head><title>25. SIGIR 2002: Tampere, Finland</title></head><body bgcolor="#ffffff" text="#000000" link="#000000"><logo style="sigir"> <h1>25. <ref href="db/conf/sigir/index.html">SIGIR</ref> 2002: Tampere,Finland</h1><hr><cite key="conf/sigir/2002"><cite key="conf/sigir/2002" style="bibtex"> <center><img alt="SIGIR 2002" src="sigir2002logo.gif" border=0 width=108 height=94></center><ul><li><cite key="conf/sigir/Rijsbergen02" style=ee></ul><h2>Web Information Retrieval</h2> <ul><li><cite key="conf/sigir/AnhM02" style=ee><li><cite key="conf/sigir/ParkPGK02" style=ee><li><cite key="conf/sigir/SiC02" style=ee><li><cite key="conf/sigir/KraaijWH02" style=ee></ul><h2>Information Retrieval Theory</h2> …
session title
12 November 2003 Michael Ley 110
.uni-trier.dedb
lp
BHT: Status
• the „dirty“ BHT files are not published
• a XML version of BHT is under construction
• an intital version of most files and a DTD is available from
http://dblp.uni-trier.de/xml/
12 November 2003 Michael Ley 111
.uni-trier.dedb
lp
Design …
• standard design: put all information in bibliographic records– the world of publications has a very rich
structure → very complex record formats (e.g. MARC)
12 November 2003 Michael Ley 112
.uni-trier.dedb
lp
Design …
• DBLP design: put only very regular information in bibliographic records– keep the records simple
• All other information is stored as „semistructured“ hypertext („BHT“)– you have the „freedom of HTML“
12 November 2003 Michael Ley 113
.uni-trier.dedb
lp
DBLP Architecture
Entry Pages
Person Pages
„Streams“: Conference Series,
Journals
Tables of Content (TOC)
PersonSearch
AdvancedSearch
generatedanswers
12 November 2003 Michael Ley 114
.uni-trier.dedb
lp
Entering DataStudents
WWW, E-Mail
wrapper
BibTeX
BibTeX
…
BHT
<h2>PODS Invited Talk</h2><ul><li>Brian Babcock, Shivnath Babu, Mayur Datar,Rajeev Motwani, Jennifer Widom: Models and Issues in Data Stream Systems. 1-16<ee>http://...</ee></ul>
<h2>Research Session 1: Award Winning Papers</h2><ul><li>Georg Gottlob, Christoph Koch: Monadic Datalog and the Expressive Power ofLanguages for Web Information Extraction. 17-28<ee>http://...</ee>
<li>Chung-Min Chen, Christine T. Cheng: From Discrepancy to Declustering: Near optimalmultidimensional declustering strategies for range queries. 29-38<ee>http://...</ee></ul>...
12 November 2003 Michael Ley 116
.uni-trier.dedb
lp
Problems …
• to get correct & complete information
• amount of work
primary challenge to run DBLP:
management – not technique
better tools should improve productivity
12 November 2003 Michael Ley 117
.uni-trier.dedb
lp
Data Quality
• Accuracy
• Completeness– only complete proceedings / journal issues
are entered
• Timeliness
• Relevance
• …
12 November 2003 Michael Ley 118
.uni-trier.dedb
lp
DBLP Browser: SemiPort Project
• platform for experiments in visualization
• bibliographic tool for users of DBLP– composition of reference lists– export in popular formats like BibTeX
• maintenance tool
12 November 2003 Michael Ley 119
.uni-trier.dedb
lp
DBLP Browser
• main memory IR system– compressed representation of the
bibliographic records
• convenient graphical user interface
• Java / Swing
• first prototype implemented
12 November 2003 Michael Ley 120
.uni-trier.dedb
lp
Outline
1. History of DBLP
2. Computer Science Publications & DLs
3. Technical Background
4. Demo
12 November 2003 Michael Ley 121
.uni-trier.dedb
lpThank you for your attention
„Porta Nigra“ – the most famous landmark of Trier
12 November 2003 Michael Ley 122
.uni-trier.dedb
lp
Outline
1. History of DBLP
2. Computer Science Publications & DLs
3. Technical Background
4. Demo
5. Additional Slides: Browser …
12 November 2003 Michael Ley 123
.uni-trier.dedb
lp
<article key="journals/ai/SchreyeBV89"><author>Danny De Schreye</author><author>Maurice Bruynooghe</author><author>Kristof Verschaetse</author><title>On the Existence of Nonterminating Queries for a Restricted Class of PROLOG-Clauses.</title><pages>237-248</pages><year>1989</year><volume>41</volume><journal>Artificial Intelligence</journal><number>2</number><url>db/journals/ai/ai41.html#SchreyeBV89</url></article>…
12 November 2003 Michael Ley 124
.uni-trier.dedb
lp
Representation of <title>-Fields:
• construct canonical Huffman codes on word level [MG-Book]
• degree of tree-nodes: 213 (0x2a-0xff)
• lexicon: 3-in-4 front coding
• Publications are sorted by journal/booktitle, year
12 November 2003 Michael Ley 125
.uni-trier.dedb
lp
…7Comprehending32sibility5-ons0sion2Compress21ibility3+n3onless6Compressions3,ve2.lets2or5Compressors/-ise2+d2s4Comprising/5omisability3,ed4s6Compromising0+r..sing.ter2Compters0,ur/-ing/on2Comptuer.4uatational1-ion4al5Compuations20tional//iting/log4Compulsory/,nd0.ting/s0Comput03abilities2-les3y7Computacional1-ion4,al1ting7Computationai6.lism5,el4ons…
Lexicon oftitle words
0=‘*‘, 1=‘+‘, 2=',', 3='-',4='.', 5='/', 6='0', 7='1', …
12 November 2003 Michael Ley 126
.uni-trier.dedb
lp
Lexicon: title words
<article key="journals/ai/SchreyeBV89"><author>Danny De Schreye</author><author>Maurice Bruynooghe</author><author>Kristof Verschaetse</author><title>c1c2c3 ...</title><pages>237-248</pages>…</article>…
12 November 2003 Michael Ley 127
.uni-trier.dedb
lp
Representation of <author>- and <editor>-Fields:
• construct list of all persons• sort them by number of publications• add inverted lists (publication numbers)
to all persons
Peter Smith 17 36 1003 7790 8800Johanna Mayer 36 2077 9002…
12 November 2003 Michael Ley 128
.uni-trier.dedb
lp
Representation of inverted indexes
• use d-gaps [MG-book]• variable-byte coding (base 111
numbers, ≈ 7bit value + last byte flag) [Scholer et al., SIGIR 2002]
nmline1line2line3
…
line n
line 1 line 2 line 3 … line n
…
1
n
1 m
12 November 2003 Michael Ley 129
.uni-trier.dedb
lp
…Alberto H. F. Laender)s»™"bê~5º"žÙ\ü¢"@Í&ñ¤$¥"ð8ì?ص\ "ÜÖà·$œ¤�"iï$-¯"c—UÚLÌ"?¨"<°#Há"nÆWÕh·â‘#:Ò"=£’ Melvin A. Breuer)#ˆÈ"j¥%ù"û"ß"—"–÷"W¯"0à"†ò$È"¶£"ž"n·\Û*Û&ª´&·í$Ÿ"Ò$¸¸Âö&»#-™"²#½"¼¹ø"'ÁyÅ&þFÌ Jean-Louis Lassez)0щ×qع$4Ò=ÆgÅ";´áM¹5ı#R÷Ò"Í#RÅ#&è#ø"ký¯-Â*˜"ô"·q™)¡)Î. ]ò8ßfŸ"Ì(§$;¬$õGí"æ#¨"aÒ’ Weiyi Meng)P¾1à:¼¶#^ž½Ê"5ÉâEî‘8¸®#Ú&ø%Xž#ס"ÇÔƒô~ú‘$O¨}̲Q܋Ë©#uÚ"‹û#¯-Ÿ$ `´GÉ2¯JÑ'°Vå Walter L. Ruzzo)"SšŽ´“åO«V©r¸ž$4 '›"m—$Æ%jà"<Æ&Ö~òHö#¤¦#ë"Ÿ®%'©Õ' µƒÝ%°À?ø"É‘$§ÅñçQÚš"˜ú �James E. Rumbaugh)o´#%ñ#Í$D®&DãQõ%\ï¦Ú¦¦©¦›žššš›œ¢¢š˜œš›™™�›#$žö§" ËÅ&G«"1ø �Abdul Sattar)8麂Ó5ò‘£’׈”®À »–"S¤´>ÒÜa›&&¸0Ö=Ú0Êl—"mØ#F¹pë#yÃ=š? Jª÷•™$,—%›Iì©Ç²…
12 November 2003 Michael Ley 130
.uni-trier.dedb
lp
Lexicon: title words
<article key="journals/ai/SchreyeBV89"><author>a1</author> <author>a2</author><author>a3</author> <title>c1c2c3 ...</title><pages>237-248</pages><journal>Artificial Intelligence</journal>…</article>…
Person table
12 November 2003 Michael Ley 131
.uni-trier.dedb
lp
Representation of <journal>- and <booktitle>-Fields:
• construct lists of all journals and booktitles
• add position of first publication with this journal/booktitle and count
Artificial Intelligence 5000 603TODS 7890 400…
12 November 2003 Michael Ley 132
.uni-trier.dedb
lp
33110509CACM)#^ãPÜTCS)7I¢KÛIEEE Transactions on Computers),[žIµInformation Processing Letters)/KàHÓTSE)89Ô6ÜIEEE Computer)+ã5Ô�JACM)0|˜5½The Computer Journal)8UÒ3ôSIAM J. Comput.)5@ 3œSoftware - Practice and Experience)6zÛ2õArtificial Intelligence)"‹Ê0˜Information and Control)/Ž÷/ØJCSS)12Î/¹IEEE Transactions on Pattern Analysis and Machine Intelligence),Ò.¤�…
12 November 2003 Michael Ley 133
.uni-trier.dedb
lp
Lexicon: title words
<article key="journals/ai/SchreyeBV89"><author>a1</author> <author>a2</author><author>a3</author> <title>c1c2c3 ...</title><pages>237-248</pages> <journal>j</journal> <number>2</number><url>db/journals/ai/ai41.html#SchreyeBV89</url></article>…
Person table
Journal table
Booktitle table
12 November 2003 Michael Ley 134
.uni-trier.dedb
lp
Representations of Paths: key, <ee>, <url>
construct paths dictionary: bottom-up tree of path elemens
conf
vldb spire sigir
Gray88 Ullman91 …
12 November 2003 Michael Ley 135
.uni-trier.dedb
lp
Lexicon: title words
Person table
Journal table
Booktitle table
paths dictionary
series table
publishers table
…Av’SchreyeBV89)T½fùf:´föf*ÚñfL•fófìfNŸf3;föfHþl3Km)g#Ÿœuv Ypš¹’b“#û"“"+´�…
Ap1SchreyeBV89Tl1l2…)gn1n2up2Yypjvnb3a1a2a3
12 November 2003 Michael Ley 136
.uni-trier.dedb
lp
12 November 2003 Michael Ley 137
.uni-trier.dedb
lp
12 November 2003 Michael Ley 138
.uni-trier.dedb
lp