+ All Categories

Guillaume Cabanac [email protected]

Date post: 10-Jan-2016
Category:
Upload: billie
View: 51 times
Download: 4 times
Share this document with a friend
Description:
Guillaume Cabanac [email protected]. Musings at the Crossroads of Digital Libraries, Information Retrieval, and Scientometrics http:// bit.ly /rguCabanac2012. March 28th, 2012. Musings at the Crossroads of DL, IR, and SCIM Guillaume Cabanac. Outline of these Musings. - PowerPoint PPT Presentation
Popular Tags:
67
Musings at the Crossroads Musings at the Crossroads of of Digital Libraries, Digital Libraries, Information Retrieval, and Information Retrieval, and Scientometrics Scientometrics http://bit.ly/rguCabanac2012 http://bit.ly/rguCabanac2012 Guillaume Cabanac [email protected] March 28th, 2012
Transcript
Page 1: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

Musings at the Crossroads ofMusings at the Crossroads ofDigital Libraries, Information Retrieval, Digital Libraries, Information Retrieval,

and Scientometricsand Scientometrics

http://bit.ly/rguCabanac2012http://bit.ly/rguCabanac2012

Guillaume [email protected]

March 28th, 2012

Page 2: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

Outline of these Musings

2

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Page 3: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

3

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Outline of these Musings

Page 4: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

4

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Question DL-1

How to transpose paper-based annotations into digital documents?

IRIRDLDL

SCIMSCIM

Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Collective annotation: Perspectives for information retrieval improvement.” RIAO’07 : Proceedings of the 8th conference on Information Retrieval and its Applications, pages 529–548. CID, may 2007.

Page 5: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

5

Characteristics of paper annotation Secular activity: older than 4 centuries Numerous applicative contexts: theology, science, literature … Personal use: “active reading” (Adler & van Doren, 1972)

Collective use: review process, opinion exchange …

From Individual Paper-based Annotation …

US students

(Marshall, 1998)

1541

Annotated bible

(Lortsch, 1910)

Fermat’s last theorem

(Kleiner, 2000)

Annotations from Blake, Keats…

(Jackson, 2001)

Les Misérables

Victor Hugo

1630 1790 1830 1881 1998

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 6: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

6

… to Collective Digital Annotations

author

87%

reader13%

1993 2005

ComMentor … iMarkup … Yawas … Amaya …

> 20 annotation systems(Cabanac et al., 2005)

Web servers (Ovsiannikov et al., 1999)

Annotation server

a discussion thread

Hard to share ‘lost’

hardcopy

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 7: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

7

W3C Annotea / Amaya (Kahan et al., 2002)

Digital Document Annotation: Examples

a reader’s comment

discussionthread

Arakne, featuring “fluid annotations” (Bouvin et al., 2002)

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 8: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

8

Collective Annotations Reviewed 64 systems designed during 1989–2008

Collective Annotation Objective data

Owner, creation date Anchoring point within the document. Granularity: all doc, words…

Subjective information Comments, various marks: stars, underlined text… Annotation types: support/refutation, question… Visibility: public, private, group…

Purpose-oriented annotation categories

Annotation remark

Annotation reminder

Annotation argumentation

Personal Annotation Space

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 9: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

9

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Question DL-2

How to measure the social validity ofa statement according to the

argumentative discussion it sparked off?

IRIRDLDL

SCIMSCIM

Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Social validation of collective annotations : Definition and experiment.” Journal of the American Society for Information Science and Technology, 61(2):271–287, feb. 2010, Wiley. DOI:10.1002/asi.21255

Page 10: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

10

Scalability issue

Which annotationsshould I read?

Social validation = degree of consensus of the group

Social Validation

Social Validation of Argumentative Debates

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 11: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

11

Social Validation of Argumentative Debates

BeforeAnnotation magma

AfterFiltered display

Informing readers about how validated each annotation is

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 12: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

12

Overview

Two proposed algorithms Empirical Recursive Scoring Algorithm (Cabanac et al., 2005)

Bipolar Argumentation Framework Extension based on Artificial Intelligence research works (Cayrol & Lagasquie-Schiex, 2005)

Social Validation Algorithms

validity

0socially neutral

– 1 socially refuted

1socially confirmed

case 1case 2case 3 case 4

A

B

A

B

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 13: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

13

Example

Computing the social validity of a debated annotation

Social Validation AlgorithmMusings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 14: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

14

Validation with a User-study

Design

Corpus: 13 discussion threads= 222 annotations + answers

Task of a participant Label opinion type Infer overall opinion

Volunteer subjects

53

119

Aim: social validation vs human perception of consensus

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 15: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

15

Q1 Do people agree when labeling opinions? Kappa coefficient (Fleiss, 1971; Fleiss et al., 2003)

Inter-rater agreement among n > 2 raters

Weak agreement, with variability subjective task

Experimenting the Social Validation of Debates

Debate Id

Fair to good

Poor

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Val

ue o

f K

appa

agreement

Page 16: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

16

Q2 How well SV approximates HP? HP = Human Perception of consensus SV = Social Validation algorithm

1. Test whether PH and VS are different (p < 0.05) Student’s paired t-test: (p = 0,20) > ( = 0,05)

2. Correlate HP et SV Pearson’s coefficient of correlation r

r(HP, SV) = 0.48 shows a weak correlation

Experimenting the Social Validation of Debates

HP – SV

Density y = p(HP – SV)

example: HP = SV for 24 % of all cases

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Den

sity

Page 17: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

17

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Question DL-3

How to harness a quiescent capital present in any community:

its documents?

IRIRDLDL

SCIMSCIM

Guillaume Cabanac, Max Chevalier, Claude Chrisment, Christine Julien. “Organization of digital resources as an original facet for exploring the quiescent information capital of a community.” International Journal on Digital Libraries, 11(4):239–261, dec. 2010, Springer. DOI:10.1007/s00799-011-0076-6

Page 18: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

18

Personal Documents Filtered, validated, organized information…

… relevant to activities in the organization

Paradox: profitable, but under-exploited Reason 1 – folders and files are private

Reason 2 – manual sharing

Reason 3 – automated sharing

Consequences People resort to resources available outside of the community Weak ROI why would we have to look outside when it’s already there?

Documents as a Quiescent WealthMusings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 19: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

19

Mapping the documents of the community SOM [Kohonen, 2001] Umap [Triviumsoft] TreeMap [Fekete & Plaisant, 2001]…

Limitations Find the documents with same topicssame topics as D Find documents that colleagues useuse with D

concept of usage: grouping documentsgrouping documents ⇆ keeping stuff in commonkeeping stuff in common

How to Benefit from Documents in a Community?Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 20: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

20

Organization-based similarities inter-folder

inter-document

inter-user

Musings at the Crossroads of DL, IR, and SCIM

Guillaume CabanacHow to Benefit from Documents in a Community?

Page 21: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

21

Purpose: Offering a global view of … people and their documents

Based on document contents Based on document usage/organization

Requirement: non-intrusiveness and confidentiality

OperationalOperational needs Find documents

With related materials With complementary materials

Seeking people ⇆ seeking documents

ManagerialManagerial needs Visualize the global/individual activity Work position required documents

How to Help People to Discover/Find/Use Documents?

community

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 22: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

22

4 views = {documents, people} {group, unit}

1. Group of documents Main topics Usage groups

2. A single document Who to liaise with? What to read?

3. Group of people Community of interest Community of use

4. A single people Interests Similar users (potential help)

Proposed System: Static AspectMusings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 23: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

23

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Outline of these Musings

Page 24: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

24

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question IR-1

Is document tie-breaking affecting the evaluation of

Information Retrieval systems?

IRIRDLDL

SCIMSCIM

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment. “Tie-breaking Bias : Effect of an Uncontrolled Parameter on Information Retrieval Evaluation.” M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. F. Smeaton (Eds.) CLEF’10 : Proceedings of the 1st Conference on Multilingual and Multimodal Information Access Evaluation, volume 6360 de LNCS, pages 112–123. Springer, sep. 2010. DOI:10.1007/978-3-642-15998-5_13

Page 25: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

25

Measuring the Effectiveness of IR systems User-centered vs. System-focused [Spärck Jones & Willett,

1997]

Evaluation campaigns 1958 Cranfield, UK 1992 TREC (Text Retrieval Conference), USA 1999 NTCIR (NII Test Collection for IR Systems), Japan 2001 CLEF (Cross-Language Evaluation Forum), Europe …

“Cranfield” methodology Task Test collection

Corpus Topics Qrels

Measures : MAP, P@X ... using trec_eval

[Voorhees, 2007]

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 26: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

26

Runs are Reordered Prior to Their EvaluationQrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim,

run_id

Reordering by trec_evalqid asc, sim desc, docno desc

Effectiveness measure = f (intrinsic_quality, )MAP, P@X, MRR…

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 27: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

27

Consequences of Run Reordering Measures of effectiveness for an IRS s

RR(s,t) 1/rank of the 1st relevant document, for topic t

P(s,t,d) precision at document d, for topic t

AP(s,t) average precision for topic t

MAP(s) mean average precision

Tie-breaking bias

Is the Wall Street Journal collection more relevant than Associated Press?

Problem 1comparing 2 systemsAP(s1, t) vs. AP(s2, t)

Problem 2 comparing 2 topicsAP(s, t1) vs. AP(s, t2)

ChrisChris

EllenEllen

Sensitive to document

rank

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 28: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

28

What we Learnt: Beware of Tie-breaking for AP Poor effect on MAP, larger effect on AP

Measure bounds APRealistic APConventionnal APOptimistic

Failure analysis for the ranking process Error bar = element of chance potential for improvement

padre1, adhoc’94

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 29: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

29

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question IR-2

How to retrieve documents matching keywords and

spatiotemporal constraints?

IRIRDLDL

SCIMSCIM

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

Damien Palacio, Guillaume Cabanac, Christian Sallaberry, Gilles Hubert. “On the evaluation of geographic information retrieval systems: Evaluation framework and case study.” International Journal on Digital Libraries, 11(2):91–109, june 2010, Springer. DOI:10.1007/s00799-011-0070-z

Page 30: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

30

Geographic Information Retrieval Query = “Road trip around Aberdeen summer 1982”

Search engines Topic term {road, trip, Aberdeen, summer}

spatial {AberdeenCity, AberdeenCounty…} Geographic temporal [21-JUN-1982 .. 22-SEP-1982]

term {road, trip, Aberdeen, summer}

1/6 queries = geographic queries Excite (Sanderson et al., 2004) AOL (Gan et al., 2008) Yahoo! (Jones et al., 2008)

Current issue worth studying

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 31: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

31

The Internals of a Geographic IR System 3 dimensions to process

Topical, spatial, temporal

1 index per dimension Topic bag of words, stemming, weighting, comparing with VSM… Spatial spatial entity detection, spatial relation resolution… Temporal temporal entity detection…

Query processing with sequential filtering e.g., priority to theme, then filtering according to other dimensions

Issue: effectiveness of GIRSs vs state-of-the-art IRSs?

Hypothesis: GIRSs better than state-of-the-art IRSs

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 32: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

32

Case Study: the PIV GIR System Indexing: one index per dimension

Topical = Terrier IRS Spatial = tiling Temporal = tiling

Retrieval Identification of the 3 dimensions in the query Routing towards each index Combination of results with CombMNZ [Fox & Shaw, 1993; Lee 1997]

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 33: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

33

Case Study: the PIV GIR System Principle of CombMNZ and Borda Count

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 34: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

34

Case Study: the PIV GIR System Gain in effectiveness

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 35: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

35

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question IR-3

Do operators in search queries improve the effectiveness of search results?

IRIRDLDL

SCIMSCIM

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

Gilles Hubert, Guillaume Cabanac, Christian Sallaberry, Damien Palacio. “Query Operators Shown Beneficial for Improving Search Results.” S. Gradmann, F. Borri, C. Meghini, H. Schuldt (Eds.) TPDL’11 : Proceedings of the 1st International Conference on Theory and Practice of Digital Libraries, volume 6966 de LNCS, pages 118–129. Springer, sep. 2011. DOI:10.1007/978-3-642-24469-8_14.

Page 36: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

Various Operators Quotation marks, Must appear (+), boosting operator (^),

Boolean operators, proximity operators…

36

Information need

“I’m looking for research projects funded in the DL domain”

Regular query Query with operators

Search Engines Offer Query Operators

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 37: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

Our Research Questions

37

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 38: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

38

Our Methodology in a Nutshell

Regular query V1: Query variant with operators

V3V2

V4VN. . .

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 39: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

39

Effectiveness of Query Operators TREC-7 per Topic Analysis: Boxplots

‘+’ and ‘^’

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 40: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

40

Effectiveness of Query Operators Per Topic Analysis: Box plot

AP of TREC’s regular query

Query variant highest AP

32Topics

AP (

Avera

ge P

reci

sion)

0.2

0.1

0.3

0.4

Query variant lowest AP

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 41: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

41

Effectiveness of Query Operators TREC-7 Per Topic Analysis

‘+’ and ‘^’

MAP = 0.1554

MAP ┬ = 0.2099+35.1%

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 42: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

42

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Outline of these Musings

Page 43: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

43

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question SCIM-1

How to recommend researchers according to their research topics

and social clues?

IRIRDLDL

SCIMSCIM

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Guillaume Cabanac. “Accuracy of inter-researcher similarity measures based on topical and social clues.” Scientometrics, 87(3):597–620, june 2011, Springer. DOI:10.1007/s11192-011-0358-1

Page 44: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

44

Recommendation of Literature (McNee et al., 2006)

Collaborative filtering Principle: mining the preferencespreferences of researchers

those who liked this paper also liked…

Snowball effect / fad Innovation? Relevance of theme?

Cognitive filtering Principle: mining the contentscontents of articles

profile of resources (researcher, articles) citation graph

Hybrid approach

????

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 45: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

45

Foundations: Similarity Measures Under Study

Model Coauthors graph authors auteurs Venues graph authors conferences / journals

Social similarities Inverse degree of separation length of the shortest path Strength of the tie number of shortest paths Shared conferences number of shared conference editions

Thematic similarity Cosine on Vector Space Model di = (wi

1, … , win)

built on titles (doc / researcher)

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 46: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

46

Computing Similarities with Social Clues Task of literature review

Requirement topical relevance Preference social proximity (meetings, project…)

re-rank topical results with social clues

Combination with CombMNZ (Fox & Shaw, 1993)

Final result: list of recommended researchers

CombMNZ

Degree of separation

Strength of ties

Shared conferences

Social list

Topical list

CombMNZ TS listTS list

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 47: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

47

Evaluation Design Comparison of recommendations and researchers’ perception

Q1 : Effectiveness of topical (only) recommendations? Q2 : Gain due to integrating social clues?

IR experiments: Cranfield paradigm (TREC…) Does the search engine retrieve relevant documents?

Doc relevant?

assessor

relevance judgments{0, 1} binary[0, N] gradual

qrels

trec_eval

Effectiveness measuresMean Average PrecisionNormalized Discounted Cumulative Gain

topic S1 S2

1 0.5687 0.6521

… … …

50 0.7124 0.7512

avg 0.6421 0.7215

improvement +12.3 % significativity p < 0.05 (paired t-test)

search engine xinput

topic

corpus

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 48: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

48

Evaluating Recommendations

doc relevant ?

assessor

relevance judgments{0, 1} binary[0, N] gradual

qrels

trec_eval

Effectiveness measures Mean Average PrecisionNormalized Discounted Cumulative Gain

topic S1 S2

1 0.5687 0.6521

… … …

50 0.7124 0.7512

avg 0.6421 0.7215

improvement +12.3 % significativity p < 0.05 (paired t-test)

search engine xinput

topic

corpus

name of a researcher

researcher

« With whom would you like to chat for improving your research? »

recommender system

topical topical + social

#subjects

Top 25

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 49: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

49

Experiment Features

Data dblp.xml (713 MB = 1.3M publications for 811,787 researchers) Subjects 90 researchers-contacts contacted by mail

74 researchers began to fill the questionnaire. 71 completed it

Interface for assessing recommendations

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 50: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

50

Experiments: Profile of the Participants Experience of the 71 subjects Mdn = 13 years

74

Productivity of the 71 subjects Mdn = 15 publications

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Num

ber

of p

arti

cipa

nts

Num

ber

of p

arti

cipa

nts

Seniority

Number of publications

Page 51: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

51

Empirical Validation of our Hypothesis Strong baseline effective approach based on VSM

+8.49 % = significant improvement (p < 0.05 ; n = 70)

of topical recommendations by social clues

0,5

0,6

0,7

0,8

0,9

1

global < 15 publis >= 15 publis < 13 ans >= 13 ans

Thématique Thématique + Social

productivity experience

+8,49 %+8,49 % +10,39 %+10,39 % +7,03 %+7,03 % +6,50 %+6,50 % +10,22 %+10,22 %

ND

CG

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Topical Topical + social

yearsyears

Page 52: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

52

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question SCIM-2

What is the landscape of research in Information Systems from the perspective of gatekeepers?

IRIRDLDL

SCIMSCIM

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Guillaume Cabanac. “Shaping the landscape of research in Information Systems from the perspective of editorial boards : A scientometric study of 77 leading journals.” Journal of the American Society for Information Science and Technology, 63, to appear in 2012, Wiley. DOI:10.1002/asi.22609

Page 53: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

53

Landscape of Research in Information Systems The gatekeepers of science

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 54: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

54

Landscape of Research in Information Systems The 77 core peer-reviewed IS journals in the WoS

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 55: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

55

Landscape of Research in Information Systems Exploratory data analysis

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 56: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

56

Landscape of Research in Information Systems Exploratory data analysis

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 57: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

57

Landscape of Research in Information Systems Topical map of the IS field

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 58: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

58

Landscape of Research in Information Systems Most influential

gatekeepers

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 59: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

59

Landscape of Research in Information Systems Number of gatekeepers per country

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 60: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

60

Landscape of Research in Information Systems Geographic and gender diversity

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 61: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

61

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Question SCIM-3

What if submission date influenced the acceptance of conference papers?

IRIRDLDL

SCIMSCIM

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Guillaume Cabanac. “What if submission date influenced the acceptance of conference papers?” Submitted to the Journal of the American Society for Information Science and Technology, Wiley.

Page 62: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

62

Conferences Affected by a Submission-Date bias? Peer-review

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 63: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

63

The Submission-Date bias Dataset from the ConfMaster conference management system

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 64: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

64

The Submission-Date bias Influence of submission date on bids

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 65: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

65

The Submission-Date bias Influence of submission date on average marks

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Page 66: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

Conclusion

66

Musings at the Crossroads of DL, IR, and SCIM

Guillaume Cabanac

Digital LibrariesDigital Libraries Collective annotations Social validation of discussion threads Organization-based document similarity

Information RetrievalInformation Retrieval The tie-breaking bias in IR evaluation Geographic IR Effectiveness of query operators

ScientometricsScientometrics Recommendation based on topics and social clues Landscape of research in Information Systems

The submission-date bias in peer-reviewed conferences

Page 67: Guillaume Cabanac guillaumebanac@univ-tlse3.fr

Thank you

http://www.irit.fr/~Guillaume.Cabanachttp://www.irit.fr/~Guillaume.Cabanac

Twitter: @tafanorTwitter: @tafanor


Recommended