+ All Categories
Home > Documents > Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of...

Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of...

Date post: 20-Mar-2019
Category:
Upload: lyhanh
View: 216 times
Download: 0 times
Share this document with a friend
12
Comparing the Scientific Impact of Conference and Journal Publications in Computer Science in Computer Science Erhard Rahm Erhard Rahm http://dbs.uni-leipzig.de A d i P bli hi i E (APE) C f 2008 B li Academic Publishing in Europe (APE) Conf., 2008, Berlin Jan. 23, 2008 Citation analysis Citation analysis Citation analysis is increasingly used to measure scientific i f impact of Journals (impact factor) Authors Authors Institutions JCR impact factors limited to journals JCR impact factors limited to journals Much computer science research is published only in conferences Need to consider citations from / to (refereed) conference publications Citation analysis is a huge data integration problem Need to automate as much as possible with good data quality 2
Transcript
Page 1: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

Comparing the Scientific Impact ofp g pConference and Journal Publications

in Computer Sciencein Computer Science

Erhard RahmErhard Rahm

http://dbs.uni-leipzig.de

A d i P bli hi i E (APE) C f 2008 B liAcademic Publishing in Europe (APE) Conf., 2008, BerlinJan. 23, 2008

Citation analysisCitation analysisCitation analysis is increasingly used to measure scientifici fimpact of

Journals (impact factor)AuthorsAuthorsInstitutions

JCR impact factors limited to journalsJCR impact factors limited to journalsMuch computer science research is published only in conferences

Need to consider citations from / to (refereed) conferencepublications

Citation analysis is a huge data integration problemNeed to automate as much as possible with good data quality

2

Page 2: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

MS Libra statistics (Dec 2007)MS Libra statistics (Dec. 2007)

http://libra msra cnhttp://libra.msra.cn

# # # it d # # it d#venues #papers(all)

#cited(all)

#papers(top 100 venues)

#cited(top100venues)

journals 471 321.000 1.655.000 190.000 1.434.000

Conference / workshop series 2.297 585.000 1.752.000 167.000 1.216.000

3

AgendaAgendaMotivationIn-depth comparison for CS publications on databases

Data sourcesC f j l i fConference vs. journal impact factorsCitation skew, rankings (nation, institution)

Data integration of bibliographic web dataData integration of bibliographic web dataMOMA framework for record matchingOnline citation service (OCS) Online citation service (OCS)

Summary

4

Page 3: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

Citation analysis of database publications*Citation analysis of database publications*10 years: 1994 – 20035 venues: 5 venues:

2 conference series (ACM SIGMOD, VLDB), 3 journals (ACM TODS, VLDB Journal, Sigmod Record)

Evaluation using 2005 and 2007 citation data

good coverage of CS venuesmanually curated, good qualityno citation counts

many citationsvery good coverage of computer science

hresearchdata quality problems (duplicates, …) due toautomatic information extraction

5* Rahm, E., A. Thor: Citation analysis of database publications, ACM Sigmod Record, Dec. 2005

Further Citation SourcesFurther Citation Sources

ACM Digital Library

6

Page 4: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

#citings per source(to papers of considered venues and years)(to papers of considered venues and years)

16000

18000

12000

14000

16000

G l S h l

8000

10000

12000 Google Scholar

MS Libra

Scopus *

6000

8000p

ACM DL

Citeseer

Th ISI**

2000

4000 Thoms. ISI**

0

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

7

*Scopus does not cover VLDB conf** ISI does not cover conferences; VLDBJ /SR since 1998/2000

as of Dec. 2007

Conferences vs JournalsConferences vs. Journals50000 # Citings per venue

# Publications (1994-2003) 40000

45000 (1994-2003)

500

600

30000

35000

GS 2005

300

400

20000

25000GS 2007

20010000

15000

0

100

SIGMOD VLDB VLDB ACM SIGMOD

0

5000

SIGMOD Conf. VLDB Conf. VLDB Journal ACM TODS SIGMOD Record

8

Conf. Conf. Journal TODS RecordJ

Page 5: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

Conf vs Journals: #citings per paperConf. vs. Journals: #citings per paper

120

100

120

80

40

60 GS 2005

GS 2007

20

40

0

SIGMOD Conf. VLDB Conf. VLDB Journal ACM TODS SIGMOD Record

9

JCR impact factors for journalsJCR impact factors for journals14

10

12

6

8 VLDB Journal

ACM TODS

2

4SIGMOD Record

Journal impact factor IF(X) = average #citings in year X for a journal

0

1996 1997 1998 1999 2000 2001 2002 2003 2004

article published in the 2 preceding years X-1 and X-2 IF can also be determined for annual conference seriesCan be generalized to articels from k preceding years (e.g. k=5)

10

Can be generalized to articels from k preceding years (e.g. k 5)

Page 6: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

GS-based impact factorsp

14 14

GS 2007GS 2005

10

12

10

12

GS 2007 GS 2005

6

8

6

8SIGMOD Conference

VLDB Conf.

VLDB J.

ACM TODS

2

4

2

4

ACM TODS

SIGMOD Record

0

1996 1997 1998 1999 2000 2001 2002 2003 2004

0

1996 1997 1998 1999 2000 2001 2002 2003 2004

Consider only citing GS publications with year (ca. 77%)SIGMOD conf. > VLDB conf. > Journals2007 d t hi h i t f t th 2005 d th i JCR2007 data: higher impact factors than 2005 and than using JCR

11

GS-based impact factors (5 years)GS-based impact factors (5 years)

GS 2007GS 2005

12

14

12

14 GS 2007GS 2005

8

10

SIGMOD Conference

VLDB Conf.8

10

4

6VLDB J.

ACM TODS

SIGMOD Record4

6

0

2

1999 2000 2001 2002 2003 2004

0

2

1999 2000 2001 2002 2003 2004

Impact factors more stable for 5 yearsConferences maintain higher impact than journals

1999 2000 2001 2002 2003 20041999 2000 2001 2002 2003 2004

Conferences maintain higher impact than journals

12

Page 7: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

Citation skewCitation skewCitation distribution (splitted by quarters)

2 % f 60 80% 25% top referenced publications → 60-80% citingsSR has highest skew, TODS is most balanced

100%

60%

80%100%

75%

50%

20%

40%50%

25%

Gini

0%

SIGMOD VLDB SIGMODR d

VLDBJ l

TODS

13

Record Journal

Aggregated Citation FrequenciesAggregated Citation Frequencies

based on institution of first authoronly papers with at least 20 citings (w/o self-citings) are considered

14

y p p g ( g )

Page 8: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

AgendaAgendaMotivationIn-depth comparison for CS publications on databases

Data sourcesC f j l i fConference vs. journal impact factorsCitation skew, nation ranking, institution ranking

Data integration of bibliographic web dataData integration of bibliographic web dataMOMA framework for record matchingOnline citation service (OCS) Online citation service (OCS)

Summary

15

Matching objects in web sourcesMatching objects in web sources@article{DBLP:journals/vldb/RahmB01,author {Erhard Rahm and Philip A Bernstein} DBLPauthor = {Erhard Rahm and Philip A. Bernstein},title = {A survey of approaches to automatic schema matching.}journal= {VLDB J.}, year = {2001}, ...

DBLP

Google Scholar

Information Fusion

ACM

16

Page 9: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

Object matching framework MOMAObject matching framework MOMAMOMA = Mapping based Object Matching*Object consolidation framework

Matching objects from 2 sourcesGeneration of instance mappings (correspondences)Generation of instance mappings (correspondences)Special case: duplicate detection within 1 source (generation of self-mapping)

Key featuresExtensible matcher libraryMapping combination

SourceA SourceA‘ Sim

a1 a‘1 1

‘ 0 9Mapping combinationConstruction of match workflowsStorage of mappings for reuse in

a2 a‘1 0.9

a3 a‘3 0.8

same mapping for authorsg pp g

other match problems

Implemented within iFuice data integration platform

same-mapping for authors

17*Thor, Rahm: MOMA - A Mapping-based Object Matching System. Proc. CIDR, 2007

MOMA ArchitectureMOMA Architecture

A

LDSA

Matcher 1 Mapping Combiner

Match Workflow

Matcher 2A

LDSB

...

MappingCache

MappingOperator

Selec-tion

SameMapping

Matcher n

B

B Operator tion

Matcher implementation

Matcher Library

Match Workflows

Matcher implementation(e.g., Attribut based) Mapping Repository

18

Match Workflows

Page 10: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

On-demand citation analysisy

On-demand citation service (OCS)*Wh h d f f X?What are the most cited papers of conference X?What is the average citation number of publications from authorY?F h i bli i & i iFrequent changes, i.e., new publications & new citations

Idea: Combine publication lists, e.g. from DBLP or Pubmed, ith it ti t f GS Cit Swith citation counts, e.g from GS, Citeseer or ScopusDBLP, Pubmed: high bibliographic data qualityGS l f it ti tGS: large coverage of citations counts

Query problem: Given a set of DBLP publications → How tofi d th di GS bli ti ?find the corresponding GS publications?

Query GS and match DBLP-GS

19*Thor, Aumueller, Rahm: Data Integration Support for Mashups. Proc. IIWeb, 2007

Online Citation Service: Result overviewOnline Citation Service: Result overview

Bibliographic data from DBLP

Sum of GS citations

Corresponding GS publicationspublications

20

Page 11: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

OCS example: Top conference papersOCS example: Top conference papers

21

OCS example: Top journal papersOCS example: Top journal papers

22

Page 12: Comppgaring the Scientific Impact of Conference and ... · Comppgaring the Scientific Impact of Conference and Journal Publications in Computer Science ... Th ISI** 2000 4000 Thoms.

AgendaAgendaMotivationIn-depth comparison for CS publications on databases

Data sourcesC f j l i fConference vs. journal impact factorsCitation skew, nation ranking, institution ranking

Data integration of bibliographic web dataData integration of bibliographic web dataMOMA framework for record matchingOnline citation service (OCS) Online citation service (OCS)

Summary

23

SummarySummaryLarge scientific impact of conference publications in computer sciencecomputer science

Must be considered for a meaningful citation analysisIn some fields, e.g. database research, top conferences receiveIn some fields, e.g. database research, top conferences receivemany more citings than top journals

Impact factors should be extended to major conferences#citings are highly skewed within venues -> need forindividual (per author/organization etc.) impact analysis

not just #publications and general venue impact

Need für improved data integration on heterogeneous datasources (more automatic high data quality) sources (more automatic, high data quality) U Leipzig: new research prototypes for data integration, object matching and on-demand citation analysis

24

object matching and on-demand citation analysis


Recommended