+ All Categories
Home > Documents > Citation Recommendations Considering Content and ... · Citation recommendation is the research...

Citation Recommendations Considering Content and ... · Citation recommendation is the research...

Date post: 25-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
8
Citation Recommendations Considering Content and Structural Context Embedding 1 st Yang Zhang Graduate School of Informatics Kyoto University Kyoto, Japan [email protected] 2 nd Qiang Ma Graduate School of Informatics Kyoto University Kyoto, Japan [email protected] Abstract—The number of academic papers being published is increasing exponentially in recent years, and recommending adequate citations to assist researchers in writing papers is a non-trivial task. Conventional approaches may not be optimal, as the recommended papers may already be known to the users, or be solely relevant to the surrounding context but not other ideas discussed in the manuscript. In this work, we propose a novel embedding algorithm DocCit2Vec, along with the new concept of “structural context”, to tackle the aforementioned issues. The proposed approach demonstrates superior performances to baseline models in extensive experiments designed to simulate practical usage scenarios. Index Terms—Citation Recommendation, Document Embed- ding, Information Retrieval, Hyper-document, Link Prediction I. I NTRODUCTION When writing a paper, one of the common questions arising in many researchers’ minds is that of which paper to cite for a certain context. However, with the exponentially increasing number of published papers, finding suitable citations is a considerable challenge. Researchers generally rely on keyword-based recom- menders, such as Google Scholar 1 and Mendeley 2 . However, using keyword-based systems often results in unsatisfying results, as the query words may not carry adequate infor- mation to reflect the need to find relevant papers [1], [2]. Therefore, many approaches have been proposed to tackle this issue from various perspectives. For example, [1], [3]– [6] attempt to recommend papers based on input seed papers. Some studies have considered personalized inputs, such as meta-data (title, abstract, keyword list, and publication year) [7], [8]. In addition, the studies [9]–[11] proposed making recommendations based on the “local context” (the context surrounding a citation). Context-based recommendations are considered to be practical for aiding the paper-writing process. However, due to the limitation of only utilizing the local con- text, these approaches may not be effective in some scenarios. For example, when a writer has finished writing a section of the manuscript, how can citations be effectively recommended to flesh out the content? Suppose that an author is writing a This work is partly supported by KAKEN (19H04116) 1 https://scholar.google.com/ 2 https://www.mendeley.com/ Fig. 1: Citation recommendations for fleshing out content. manuscript as in the example shown in Figure 1. Some parts of the manuscript are finished, and suitable citations have been inserted (content in the blue box). However, some parts have just been written, where citations have not yet been inserted (content in the red box). The objective of this work is to design a model to find relevant papers for freshly written content in a manuscript. Conventional context-based approaches may not be optimal, for three reasons: 1) Previously cited papers are considered to be “redundant” for fleshing out the content. Users are already aware of these papers, and therefore they will cite them to flesh out the content without recommendation if they are relevant. 2) Previously cited papers contain information on co- citations, which may lead to more effective recommen- dations. For example, if the papers A, B, C, and D are frequently cited together, then it is likely that a paper will cite D if it has already cited A, B, and C. 3) An effective citation recommendation may not only be relevant to the idea discussed in the local context, but could also be linked to ideas that have been discussed in previous contexts. To provide a solution to the above-mentioned issues, we introduce the “structural context” in addition to the contexts arXiv:2001.02344v1 [cs.IR] 8 Jan 2020
Transcript
Page 1: Citation Recommendations Considering Content and ... · Citation recommendation is the research field of finding the most relevant papers based on an input query. The studies [1],

Citation Recommendations Considering Content andStructural Context Embedding

1st Yang ZhangGraduate School of Informatics

Kyoto UniversityKyoto, Japan

[email protected]

2nd Qiang MaGraduate School of Informatics

Kyoto UniversityKyoto, Japan

[email protected]

Abstract—The number of academic papers being publishedis increasing exponentially in recent years, and recommendingadequate citations to assist researchers in writing papers is anon-trivial task. Conventional approaches may not be optimal, asthe recommended papers may already be known to the users, orbe solely relevant to the surrounding context but not other ideasdiscussed in the manuscript. In this work, we propose a novelembedding algorithm DocCit2Vec, along with the new conceptof “structural context”, to tackle the aforementioned issues.The proposed approach demonstrates superior performances tobaseline models in extensive experiments designed to simulatepractical usage scenarios.

Index Terms—Citation Recommendation, Document Embed-ding, Information Retrieval, Hyper-document, Link Prediction

I. INTRODUCTION

When writing a paper, one of the common questions arisingin many researchers’ minds is that of which paper to cite fora certain context. However, with the exponentially increasingnumber of published papers, finding suitable citations is aconsiderable challenge.

Researchers generally rely on keyword-based recom-menders, such as Google Scholar 1 and Mendeley 2. However,using keyword-based systems often results in unsatisfyingresults, as the query words may not carry adequate infor-mation to reflect the need to find relevant papers [1], [2].Therefore, many approaches have been proposed to tacklethis issue from various perspectives. For example, [1], [3]–[6] attempt to recommend papers based on input seed papers.Some studies have considered personalized inputs, such asmeta-data (title, abstract, keyword list, and publication year)[7], [8]. In addition, the studies [9]–[11] proposed makingrecommendations based on the “local context” (the contextsurrounding a citation). Context-based recommendations areconsidered to be practical for aiding the paper-writing process.However, due to the limitation of only utilizing the local con-text, these approaches may not be effective in some scenarios.For example, when a writer has finished writing a section ofthe manuscript, how can citations be effectively recommendedto flesh out the content? Suppose that an author is writing a

This work is partly supported by KAKEN (19H04116)1https://scholar.google.com/2https://www.mendeley.com/

Fig. 1: Citation recommendations for fleshing out content.

manuscript as in the example shown in Figure 1. Some partsof the manuscript are finished, and suitable citations have beeninserted (content in the blue box). However, some parts havejust been written, where citations have not yet been inserted(content in the red box). The objective of this work is to designa model to find relevant papers for freshly written content ina manuscript.

Conventional context-based approaches may not be optimal,for three reasons:

1) Previously cited papers are considered to be “redundant”for fleshing out the content. Users are already aware ofthese papers, and therefore they will cite them to flesh outthe content without recommendation if they are relevant.

2) Previously cited papers contain information on co-citations, which may lead to more effective recommen-dations. For example, if the papers A, B, C, and D arefrequently cited together, then it is likely that a paper willcite D if it has already cited A, B, and C.

3) An effective citation recommendation may not only berelevant to the idea discussed in the local context, butcould also be linked to ideas that have been discussed inprevious contexts.

To provide a solution to the above-mentioned issues, weintroduce the “structural context” in addition to the contexts

arX

iv:2

001.

0234

4v1

[cs

.IR

] 8

Jan

202

0

Page 2: Citation Recommendations Considering Content and ... · Citation recommendation is the research field of finding the most relevant papers based on an input query. The studies [1],

containing word information (local contexts), which representsall the existing citations in the manuscript. This is motivatedby the following considerations:

1) The structural context carries information on all the exist-ing citations, and can thus avoid recommending citationsalready known to the user.

2) It can effectively recommend co-cited papers.3) Existing citations are relevant to various ideas in the

fleshed out content.Considering these facts, we propose DocCit2Vec as an

improved citation recommendation approach. This is a novelembedding algorithm, designed to embed academic papers asdense vectors by preserving information on the content, localcontext, and structural context.

The major contributions of this paper are summarized asfollows:

1) The new concept of “structural context” is introduced toimprove citation recommendations in practical scenarios(Section III).

2) We propose the novel embedding model DocCit2Vec,and implement this with two neural network designs toexplore the ability of an attention mechanism in citationrecommendation tasks compared with a conventionalaverage hidden layer (Section IV).

3) The models are validated through extensive experiments,including three citation recommendation tasks and twoclassification tasks (Section V).

The remainder of this paper is structured as follows. SectionII summarizes related studies. Section III provides notations,definitions, and the problem statement. Then, the embeddingmodels and mathematical expressions are described in SectionIV, and the experiments and results are explained in SectionV. Finally, Section VI concludes the paper.

II. RELATED WORK

A. Document Embedding

Document embedding studies the representation of doc-uments as continuous vectors. The Word2Vec model [12],[13] proposed a simple neural network to learn representationvectors for words in a given context. Doc2Vec [14] furtherexpanded this model to embed documents. However, these twomodels suffer from information loss concerning links whenembedding linked documents. HyperDoc2Vec [9] was devel-oped to incorporate the link information into dense vectors.Nevertheless, when applied to our scenario these techniquesmay still suffer from information loss, as they do not explicitlymodel the structural context.

B. Citation Recommendation

Citation recommendation is the research field of finding themost relevant papers based on an input query. The studies [1],[3]–[6] proposed utilizing a list of seed papers as a query,and providing recommendations through techniques such ascollaborative filtering [3], [6] and PageRank-based methods[1], [4], [5]. Some studies have considered personalized

queries, such as meta-data (title, abstract, keyword list, andpublication year) [7], [8]. These methods would be effective inhelping early-phase researches, such as by generating researchideas. However, for manuscript-writing tasks they may offerminimal help. Context-based recommendations [9]–[11] takethe context of a passage from a manuscript as a query, toprovide recommendations aiming to assist the paper-writingprocess. However, these methods suffer from information loss,as information on citations already known to the user and otherideas in the paper are not taken into consideration.

C. Attention Mechanism

The attention mechanism is commonly applied in vision-related tasks [15], where it can detect certain parts of animage to perform accurate predictions through learning. Theauthors of [16] extended the Word2Vec model using an atten-tion mechanism to enhance the performance in classificationtasks. In this work, we continue to explore the ability of theattention mechanism when applied in an embedding algorithmto enhance the performance for citation recommendation andclassification tasks.

III. PRELIMINARIES

A. Notations and Definitions

We adapt the same notations to define hyper-documents as[9]. Let w ∈ W representing a word from a vocabulary W ,and d ∈ D representing a document id (the paper DOIs) froman id collection D . The textual information of a paper H isrepresented as a sequence of words and document ids, i.e.,W ∪ D where W ⊆W and D ⊆ D .

Each citation relation (exemplified in Fig. 2) in a paper His expressed by a tuple 〈ds , dt ,Dn ,C 〉, where ds ∈ D is theid of the citing paper, the target id dt ∈ D represents thecited paper, and C ⊆ W is the local context around dt . Ifother citations exist in the manuscript, then these are definedas the “structural context”, denoted by Dn where {dn |dn ∈D , dn 6= dt , dn 6= ds}.

Embedding matrices are denoted as D ∈ Rk×|D| fordocuments and W ∈ Rk×|W | for words. The i-th column ofD, di, is a k -dimensional vector representing the documentdi , and the j-th column of W is the k-dimensional vector forthe word wj .

B. Problem Statement

There are two objectives for this task: First, embedding thepapers as dense vectors, and second finding the most similarpapers based on the embedded vector of an input paper. Thetwo problems are stated as follows.

1) Hyper-document Embedding: Inspired by [9], we adopttwo-step learning procedures to embed academic papers. Thefirst step aims to embed the document and related contents,and the second embeds the local and structural contexts basedon the learned vectors from the first step.

In the first step, the paper X is simply treated as a plaintext containing an id di and words {w|w ∈ W }. The objectiveis to learn to represent di as a k-dimensional vector di, and

Page 3: Citation Recommendations Considering Content and ... · Citation recommendation is the research field of finding the most relevant papers based on an input query. The studies [1],

Fig. 2: Citation relations of a paper H.

all the words as a matrix W, where each vector wi is therepresentation for a word wi ∈ W .

For the second step, a citation 〈ds , dt ,Dn ,C 〉 from thepaper X and the previously learned matrices W and di aregiven. Then, each document id in the set di ∈ ds ∪ dt ∪ Dn

is learned, and these are represented by the matrix Dc, whereeach column is the k-dimensional vector for the correspondingdoc id. Furthermore, Wc ⊆ W represents the embeddingmatrix for the context words C , where each column is a vectorrepresenting w ∈ C .

2) Citation Recommendation: We set three usage cases tosimulate the proposed scenario. For each case, the purpose isto find the most similar papers based on vector similarities.• Case 1: In this case, it is assumed that the author has

inserted a number of citations in the completed contentof the manuscript. To find the most similar paper, we takethe local context of the content without citations and thestructural context (previously existing citations) as inputvectors, and rank document vectors by similarity.

• Case 2: This case assumes that some of the existingcitations are invalid, because they might not be availablein the dataset or the author has made typos. Whenrecommending, the local context and randomly selectedstructural contexts are adopted for the similarity compu-tation.

• Case 3: If all the existing citations are invalid or theauthor has not inserted any citations, then only the localcontext is adopted for the similarity computation.

IV. DOCCIT2VEC

A. Representing Documents and Citations

The embedding of documents refers to the process of as-signing vectors to each document id and word, and optimizingthese with predictions of target words or document ids. Forexample, pv-dm [14] learns two vectors (IN and OUT vectors)for each word, i.e., wI and wO, respectively, and one OUTvector dI for each document id. Given a document id and itscontent, the model picks a word as the target, and averages

over the IN vectors of the document id dI and all the wordsin the surrounding context {wI

i |i ∈ C} to make a predictionof the target word’s OUT vector wO. Representing documentsas predictions of target words enables the document vector toreflect the words the document contains, i.e., the content.

DocCit2Vec conceptualizes the learning process as the pre-diction of a target citation, so that an embedded documentvector carries the information of a target citation. This processis illustrated in Fig.3. Given a citation relation, DocCit2Vecpicks one publication from the citation list as the target,and utilizes the surrounding context and structural context asknown knowledge to maximize the occurrence of the targetcitation by updating the parameters (i.e., the embedding vec-tors) of the neural network. The model learns two documentembedding vectors IN and OUT, where the IN vector dI

characterizes the document as a citing paper and the OUTvector dO encodes its role as a cited paper [9]. In addition,the model learns IN word vectors wI.

As mentioned in Section 3.2, embedding academic papersinvolves two steps: embedding of the content and of thecitations. For the first step, we adopt the retrofitting techniqueas in [9], which initializes a predefined number of iterationsbased on the pv-dm model and then utilizes the learned vectorsas the “base” vectors for step two.

Two designs of DocCit2Vec are proposed, the first design(DocCit2Vec-avg) uses an conventional averaged hidden layer,and the second (DocCit2Vec-att) adopts an attention hiddenlayer.

B. DocCit2Vec with an Average Hidden Layer

The architecture of DocCit2Vec-avg is constructed basedon the pv-dm structure of Doc2Vec [14] and HyperDoc2Vec[9], shown on the right side of Fig. 3. The model initializesan IN document matrix DI at the input layer, an OUTdocument matrix DO at the output layer, and an IN wordmatrix WI at the input layer. To obtain a citation relation〈dx , dt , {di |i ∈ Dn}, {w|w ∈ C}〉, the model averagesover the corresponding IN vectors of dx , {di |i ∈ Dn}, and

Page 4: Citation Recommendations Considering Content and ... · Citation recommendation is the research field of finding the most relevant papers based on an input query. The studies [1],

{w|w ∈ C}. The output layer is computed using a multi-class softmax classifier, and the output value is regarded asthe probability of the occurrence of dt .

To learn all the citation relations C, the model is statisticallyexpressed as

maxDI,DO,WI

1

|C|∑

〈ds ,dt ,Dn ,C 〉∈C

logP (dt |ds ,Dn ,C ) (1)

The hidden layer of the neural network is expressed as

x =1

1 + |Dn |+ |C|

(ds

I +∑

dn∈Dn

dnI +

∑w∈C

wI)

(2)

The output layer adopts a multi-class softmax function,which is represented as

P (dt |ds ,Dn ,C ) =exp

(xTdO

t

)∑d∈D exp

(xTdO

) (3)

The negative sampling technique [13] is adopted to optimizethe efficiency of the training procedure:

log σ(xTdO

t

)+

n∑i=1

Edi PN(d)log σ

(− xTdO

i

)(4)

To optimize the model, the gradients of the loss functionwith respect to the parameters DI , DO, and WI are com-puted. The parameters are then updated with an input learningrate through backpropagation.

C. DocCit2Vec with an Attention Hidden Layer

The architecture of DocCit2Vec-att is the same as thatof DocCit2Vec-avg on the right side of Fig. 3, except thatthe averaged hidden layer is replaced by an attention layer,inspired by [16]. In addition to the original parameters, theweight vector K ∈ R1×(|D|+|W |) is introduced at the attentionlayer, where each value denotes the importance of a word ordocument. The model is statistically expressed as

maxDI,DO,WI,K

1

|C|∑

〈ds ,dt ,Dn ,C 〉∈C

logP (dt |ds ,Dn ,C ) (5)

Instead of the averaged hidden layer, the attention layercomputes a weighted sum of an individual word and documentthrough the multiplication of the vector and its weight ratio,which is expressed as follows:

x = dsI · ads

+∑

dn∈Dn

dnI · adn

+∑w∈C

wI · aw (6)

The terms ads, adn

, and aw are associated weight ratios forthe documents ds and dn and the word w . The weight ratiosare computed by using the matrix K as follows:

ai =exp ki∑

j∈(ds∪Dn∪C ) exp kj(7)

For output layer, negative sampling is performed in anidentical manner to DocCit2Vec. In addition, the gradient ofK is computed to optimize DocCit2Vec using attention.

V. EXPERIMENTS

A. Datasets and Experimental Settings

Two pre-processed datasets DBLP and ACL Anthology [9],containing full texts of academic papers in the computer sci-ence domain are utilized. Statistical summaries of the datasetsare provided in Table I. We conduct two experiments on thelarge-sized DBLP dataset: citation recommendations and topicclassification. For the citation recommendations, we pickedall the documents with more than one citation from the samedataset published in recent years as the test dataset, and theremainder of the dataset was utilized to train the embeddingmodel. Another medium-sized dataset, the ACL Anthology,was also adopted to implement the citation recommendationexperiment. This contains a similar number of average cita-tions per document as DBLP. Same to DBLP, we selecteda test dataset for testing the recommendation performance,and a training dataset for to train the model.The titles andincluded citations of the texts are replaced by indices, so thatthe algorithm can detect these as hyperlinks. For example, acitation [1] or (Song et al. 2018) is replaced with an index,and the title of the cited paper is replaced by the same index.Following the same settings as in [9], we set 50 words beforeand after a citation as the citation context.

We implemented three baseline models: Word2Vec,Doc2Vec, and HyperDoc2Vec. The Gensim package [17] wasutilized to implement the baseline models, as well as thefoundation for developing DocCit2Vec. We employed thesame hyper-parameter settings as in [9]. For Word2Vec, theembedding size was set to 100 with a window size of 50 and5 epochs, using the cbow structure, and the default Gensimsettings were followed. For Doc2Vec, the same embedding,window size, and epoch setting were adopted with the pv-dbow structure. For HyperDoc2Vec, the same embedding andwindow size were adopted, with 100 iterations and 1000negative samplings, and with the initialization of Doc2Vec atfive epochs. The same settings were adopted for DocCit2Vecas for HyperDoc2Vec. The model are implemented on a Linuxserver with 12 cores of Intel Xeon E5-1650 cpu and 128Gbmemory installed with Anaconda 5.2.0 and Gensim 2.3.0.

B. Recommendation Experiments

As mention in Section III-B2, we designed three cases forthe citation recommendation experiment. Each case adoptsdifferent input query:• Case 1: Utilize the averaged vector of the context words

(50 words before and after a citation) and structuralcontexts.

• Case 2: Utilize the averaged vector of the context wordsand randomly selected structural contexts.

• Case 3: Utilize the averaged vector of the context words.We adopt different similarity calculation methods for thebest performance of each model, as described in [9]. Weemploy the IN-for-OUT (I4O) method for Word2Vec (W2V)and HyperDoc2Vec (HD2V), which uses the averaged INfeature vectors to rank the OUT document vectors by the

Page 5: Citation Recommendations Considering Content and ... · Citation recommendation is the research field of finding the most relevant papers based on an input query. The studies [1],

Fig. 3: Overview of DocCit2Vec

dot product. Doc2Vec implements an IN-for-IN (I4I) method,which first infers a vector from the IN feature vectors byapplying the learned model, and then ranks the IN documentvectors by the cosine similarity. In addition, as in [9], we runDoc2Vec-nc (D2V-nc) on the training file without citations,and Doc2Vec-cacNev (D2V-cac) on the training file with“augmented contexts”, i.e., each citation context is copied intothe target document. For DocCit2Vec-avg (DC2V-avg) andDocCit2Vec-att (DC2V-att), we employ the same I4O method.For the evaluation, the recall, MAP, and nDCG are reportedand compared for the top 10 results.

Four observations can be made from the results (Table II).First, DocCit2Vec-avg demonstrated a superior performanceon the larger sized dataset, DBLP, with a significant improve-ment. The recall was higher by approximately 12% to 15%according to the different cases compared to the second-best

model HyperDoc2Vec, with 4% to 7% improvements for theMAP and 6% to 11% for the nDCG. Second, all the modelsyielded better performances for the medium-sized dataset ACLAnthology, except for DocCit2Vec-avg, which may revealthat DocCit2Vec-avg requires a larger volume of data toconverge. HyperDoc2Vec yielded the best results for thisdataset, followed by DocCit2Vec-avg with close scores. Third,all baseline models exhibit similar scores across the three casesexcept for DocCit2Vec-avg. It is observed that DocCit2Vec-avg constantly yielded the best scores for the first case, whereall the structural contexts are included, and the second best forthe second case, where the structural contexts are randomlypicked. This indicates that the information on the structuralcontexts is embedded into the embedding vectors. Fourth,the performances of DocCit2Vec-att are among the lowest,suggesting that citation recommendation is not a suitable task

Page 6: Citation Recommendations Considering Content and ... · Citation recommendation is the research field of finding the most relevant papers based on an input query. The studies [1],

TABLE I: Statistics of the Datasets

Dataset No. of Docs No. of Citations Experiments

ACL 20,408 Train: 14,654 108,729 Train: 79,932 Citation Rec.Test: 1,563 Test: 28,797

DBLP 649,114 Train: 630,909 2,874,303 Train: 2,770,712 Citation Rec.ClassificationsTest: 18,205 Test: 103,591

TABLE II: Citation Recommendation Results

Model ACL Anthology DBLPRecall MAP nDCG Recall MAP nDCG

W2V (Case 1) 27.25 13.74 19.51 20.47 10.54 14.71W2V (Case 2) 26.50 13.74 19.51 20.47 10.55 14.71W2V (Case 3) 26.06 13.21 18.66 20.15 10.40 14.49D2V-nc (Case 1) 19.92 9.06 13.39 7.90 3.17 4.96D2V-nc (Case 2) 19.89 9.06 13.38 7.90 3.17 4.96D2V-nc (Case 3) 19.89 9.07 13.38 7.91 3.17 4.97D2V-cac (Case 1) 20.51 9.24 13.68 7.91 3.17 4.97D2V-cac (Case 2) 19.89 9.06 13.38 7.91 3.17 4.97D2V-cac (Case 3) 20.51 9.24 13.69 7.89 3.17 4.97HD2V (Case 1) 37.53 19.64 27.20 28.41 14.20 20.37HD2V (Case 2) 36.85 19.64 27.20 28.43 14.20 20.39HD2V (Case 3) 36.24 19.32 26.79 28.41 14.20 20.37DC2V-att (Case 1) 27.48 13.42 19.24 7.381 3.358 4.889DC2V-att (Case 2) 25.01 12.37 17.66 6.06 2.73 3.98DC2V-att (Case 3) 25.01 11.48 16.27 5.20 2.36 3.43DC2V-avg (Case 1) 36.89 20.44 27.72 44.23 21.80 31.34DC2V-avg (Case 2) 33.67 18.40 25.10 40.37 20.15 28.69DC2V-avg (Case 3) 31.14 16.97 23.20 40.37 19.02 26.84

for this model.

C. Classification Experiments

We conduct two classification experiments: topic classifi-cation 3 and classification of the functionality of citations 4.The two experiments aim to test the performances of documentand word vectors separately, where the first experiment adoptsdocument vectors and the second uses words vectors. Thedataset for topic classification includes 5,975 academic papersand 10 unique fields to which they belong. The dataset for theclassification of the functionality of citations includes 2,824citation contexts, each with a classified functionality, such as“PBas: Cited work used as a basis or starting point” , whichrepresent 12 unique classes.

We employ three methods for the first experiment: The firstuses a concatenation of IN and OUT vectors of the documents(“IN+OUT” in Table III), the second uses the IN vectors ofthe documents, and the third concatenates the IN or IN+OUTvectors with embedding vectors from DeepWalk [19]. For thesecond experiment, we utilize the averaged IN vector of thecontext words.

DocCit2Vec-att yields the best performance in topic clas-sification (Table III), with F1-macro and F1-micro scoresapproximately 3% to 4% higher compared to the second-best model HyperDoc2Vec. DocCit2Vec-avg is ranked asthe second lowest among all the models. For the secondexperiment, neither DocCit2Vec-att nor DocCit2Vec-avg isranked at the top. First, the results reveal that DocCit2Ve-attimproves the classification abilities of document embedding

3Cora dataset, https://people.cs.umass.edu/˜mccallum/data.html.4Dataset from [18].

vectors, although not word embedding vectors. Second, thereason for the inferiority of DocCit2Vec-avg is that it takesmultiple documents as input, and therefore it emphasizes the“similarity” between documents. In summary, attention fo-cuses on “difference”, whereas the averaged layer emphasizes“similarity”.

VI. CONCLUSION

We propose a novel document embedding model, Doc-Cit2Vec, with consideration of the “structural context”, toimprove the recommendation performance for writing aca-demic papers. Two implementations of the model are pro-posed: the first one comes with an average hidden layer(DocCit2Vec-avg) and the second with an attention hiddenlayer (DocCit2Vec-att). Experimental results demonstrate thesuperior performance of DocCit2Vec-avg for citation recom-mendation tasks. Furthermore, DocCit2Vec-att yields an effec-tive performance for the classification task with the adoptionof document embedding vectors. In the future, we plan todesign more sophisticated neural networks, and combine withgraph and keyword based approaches to further improve theperformance.

REFERENCES

[1] H. Jia and E. Saule, “An analysis of citation recommender systems:Beyond the obvious,” in Proceedings of the 2017 IEEE/ACMInternational Conference on Advances in Social Networks Analysis andMining 2017, Sydney, Australia, July 31 - August 03, 2017, J. Diesner,E. Ferrari, and G. Xu, Eds. ACM, 2017, pp. 216–223. [Online].Available: https://doi.org/10.1145/3110025.3110150

Page 7: Citation Recommendations Considering Content and ... · Citation recommendation is the research field of finding the most relevant papers based on an input query. The studies [1],

TABLE III: Results of Classification Experiments

ModelTopic Classification Classification of

FunctionalityOriginal with DeepWalk F1-micro F1-macroF1-micro F1-macro F1-micro F1-macro

W2V (IN+OUT) 58.11 37.00 74.62 63.66 N/A N/AW2V (IN) 57.83 36.90 76.61 66.94 77.36 56.52D2V-nc 78.70 70.91 82.67 76.16 63.51 6.47D2V-cac 78.99 71.11 82.41 75.95 63.51 6.47HD2V (IN+OUT) 80.26 73.72 82.38 76.11 N/A N/AHD2V (IN) 79.12 72.78 82.38 76.30 75.63 54.33DC2V-att (IN+OUT) 82.70 76.86 84.84 79.50 N/A N/ADC2V-att (IN) 83.80 78.43 85.49 80.59 74.46 54.43DC2V-avg (IN+OUT) 77.56 70.43 79.23 72.65 N/A N/ADC2V-avg (IN) 75.28 68.18 77.28 70.36 75.91 54.67

[2] ——, “Local is good: A fast citation recommendation approach,”in Advances in Information Retrieval - 40th European Conferenceon IR Research, ECIR 2018, Grenoble, France, March 26-29, 2018,Proceedings, ser. Lecture Notes in Computer Science, G. Pasi,B. Piwowarski, L. Azzopardi, and A. Hanbury, Eds., vol. 10772.Springer, 2018, pp. 758–764. [Online]. Available: https://doi.org/10.1007/978-3-319-76941-7 73

[3] C. Caragea, A. Silvescu, P. Mitra, and C. L. Giles, “Can’t see the forestfor the trees?: a citation recommendation system,” in 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’13, Indianapolis,IN, USA, July 22 - 26, 2013, J. S. Downie, R. H. McDonald, T. W.Cole, R. Sanderson, and F. Shipman, Eds. ACM, 2013, pp. 111–114.[Online]. Available: https://doi.org/10.1145/2467696.2467743

[4] M. Gori and A. Pucci, “Research paper recommender systems: Arandom-walk based approach,” in 2006 IEEE / WIC / ACM InternationalConference on Web Intelligence (WI 2006), 18-22 December 2006,Hong Kong, China. IEEE Computer Society, 2006, pp. 778–781.[Online]. Available: https://doi.org/10.1109/WI.2006.149

[5] O. Kucuktunc, E. Saule, K. Kaya, and U. V. Catalyurek, “Towardsa personalized, scalable, and exploratory academic recommendationservice,” in Advances in Social Networks Analysis and Mining 2013,ASONAM ’13, Niagara, ON, Canada - August 25 - 29, 2013, J. G.Rokne and C. Faloutsos, Eds. ACM, 2013, pp. 636–641. [Online].Available: https://doi.org/10.1145/2492517.2492605

[6] S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam,A. M. Rashid, J. A. Konstan, and J. Riedl, “On the recommendingof citations for research papers,” in CSCW 2002, Proceeding on theACM 2002 Conference on Computer Supported Cooperative Work,New Orleans, Louisiana, USA, November 16-20, 2002, E. F. Churchill,J. F. McCarthy, C. Neuwirth, and T. Rodden, Eds. ACM, 2002, pp.116–125. [Online]. Available: https://doi.org/10.1145/587078.587096

[7] A. Alzoghbi, V. A. A. Ayala, P. M. Fischer, and G. Lausen,“Pubrec: Recommending publications based on publicly availablemeta-data,” in Proceedings of the LWA 2015 Workshops: KDML,FGWM, IR, and FGDB, Trier, Germany, October 7-9, 2015, ser.CEUR Workshop Proceedings, R. Bergmann, S. Gorg, and G. Muller,Eds., vol. 1458. CEUR-WS.org, 2015, pp. 11–18. [Online]. Available:http://ceur-ws.org/Vol-1458/D01 CRC69 Alzoghbi.pdf

[8] S. Li, P. Brusilovsky, S. Su, and X. Cheng, “Conference paperrecommendation for academic conferences,” IEEE Access, vol. 6,pp. 17 153–17 164, 2018. [Online]. Available: https://doi.org/10.1109/ACCESS.2018.2817497

[9] J. Han, Y. Song, W. X. Zhao, S. Shi, and H. Zhang, “hyperdoc2vec:Distributed representations of hypertext documents,” in Proceedingsof the 56th Annual Meeting of the Association for ComputationalLinguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume1: Long Papers, I. Gurevych and Y. Miyao, Eds. Association forComputational Linguistics, 2018, pp. 2384–2394. [Online]. Available:https://www.aclweb.org/anthology/P18-1222/

[10] Q. He, D. Kifer, J. Pei, P. Mitra, and C. L. Giles, “Citationrecommendation without author supervision,” in Proceedings of theForth International Conference on Web Search and Web Data Mining,WSDM 2011, Hong Kong, China, February 9-12, 2011, I. King,W. Nejdl, and H. Li, Eds. ACM, 2011, pp. 755–764. [Online].Available: https://doi.org/10.1145/1935826.1935926

[11] Q. He, J. Pei, D. Kifer, P. Mitra, and C. L. Giles, “Context-awarecitation recommendation,” in Proceedings of the 19th InternationalConference on World Wide Web, WWW 2010, Raleigh, North Carolina,USA, April 26-30, 2010, M. Rappa, P. Jones, J. Freire, andS. Chakrabarti, Eds. ACM, 2010, pp. 421–430. [Online]. Available:https://doi.org/10.1145/1772690.1772734

[12] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation ofword representations in vector space,” in 1st International Conferenceon Learning Representations, ICLR 2013, Scottsdale, Arizona, USA,May 2-4, 2013, Workshop Track Proceedings, Y. Bengio and Y. LeCun,Eds., 2013. [Online]. Available: http://arxiv.org/abs/1301.3781

[13] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,“Distributed representations of words and phrases and theircompositionality,” in Advances in Neural Information ProcessingSystems 26: 27th Annual Conference on Neural InformationProcessing Systems 2013. Proceedings of a meeting held December5-8, 2013, Lake Tahoe, Nevada, United States, C. J. C.Burges, L. Bottou, Z. Ghahramani, and K. Q. Weinberger, Eds.,2013, pp. 3111–3119. [Online]. Available: http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality

[14] Q. V. Le and T. Mikolov, “Distributed representations of sentences anddocuments,” in Proceedings of the 31th International Conference onMachine Learning, ICML 2014, Beijing, China, 21-26 June 2014, ser.JMLR Workshop and Conference Proceedings, vol. 32. JMLR.org,2014, pp. 1188–1196. [Online]. Available: http://proceedings.mlr.press/v32/le14.html

[15] Y. Tang, N. Srivastava, and R. Salakhutdinov, “Learninggenerative models with visual attention,” in Advances in NeuralInformation Processing Systems 27: Annual Conference onNeural Information Processing Systems 2014, December 8-132014, Montreal, Quebec, Canada, Z. Ghahramani, M. Welling,C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds.,2014, pp. 1808–1816. [Online]. Available: http://papers.nips.cc/paper/5345-learning-generative-models-with-visual-attention

[16] W. Ling, Y. Tsvetkov, S. Amir, R. Fermandez, C. Dyer, A. W.Black, I. Trancoso, and C. Lin, “Not all contexts are created equal:Better word representations with variable attention,” in Proceedingsof the 2015 Conference on Empirical Methods in Natural LanguageProcessing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015,L. Marquez, C. Callison-Burch, J. Su, D. Pighin, and Y. Marton, Eds.The Association for Computational Linguistics, 2015, pp. 1367–1372.[Online]. Available: https://www.aclweb.org/anthology/D15-1161/

[17] R. Rehurek and P. Sojka, “Software Framework for Topic Modellingwith Large Corpora,” in Proceedings of the LREC 2010 Workshop onNew Challenges for NLP Frameworks. Valletta, Malta: ELRA, May2010, pp. 45–50, http://is.muni.cz/publication/884893/en.

[18] S. Teufel, A. Siddharthan, and D. Tidhar, “Automatic classificationof citation function,” in EMNLP 2006, Proceedings of the 2006Conference on Empirical Methods in Natural Language Processing,22-23 July 2006, Sydney, Australia, D. Jurafsky and E. Gaussier, Eds.ACL, 2006, pp. 103–110. [Online]. Available: https://www.aclweb.org/anthology/W06-1613/

[19] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: online learningof social representations,” in The 20th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, KDD ’14, New

Page 8: Citation Recommendations Considering Content and ... · Citation recommendation is the research field of finding the most relevant papers based on an input query. The studies [1],

York, NY, USA - August 24 - 27, 2014, S. A. Macskassy, C. Perlich,J. Leskovec, W. Wang, and R. Ghani, Eds. ACM, 2014, pp. 701–710.[Online]. Available: https://doi.org/10.1145/2623330.2623732


Recommended