+ All Categories
Home > Documents > Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... ·...

Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... ·...

Date post: 30-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
13
Information and Software Technology 109 (2019) 1–13 Contents lists available at ScienceDirect Information and Software Technology journal homepage: www.elsevier.com/locate/infsof Is deep learning better than traditional approaches in tag recommendation for software information sites? Pingyi Zhou a , Jin Liu a,b,, Xiao Liu c , Zijiang Yang d , John Grundy e a School of Computer Science, Wuhan University, Wuhan, China b Key Laboratory of Network Assessment Technology, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China c School of Information Technology, Deakin University, Geelong, Australia d Department of Computer Science, Western Michigan University, Kalamazoo, Michigan, USA e Faculty of Information Technology, Monash University, Melbourne, Australia a r t i c l e i n f o Keywords: Deep learning Data analysis Tag recommendation Software information site Software object a b s t r a c t Context: Inspired by the success of deep learning in other domains, this new technique been gaining widespread recent interest in being applied to diverse data analysis problems in software engineering. Many deep learning models, such as CNN, DBN, RNN, LSTM and GAN, have been proposed and recently applied to software engineer- ing tasks including effort estimation, vulnerability analysis, code clone detection, test case selection, requirements analysis and many others. However, there is a perception that applying deep learning is a ”silver bullet” if it can be applied to a software engineering data analysis problem. Object: This motivated us to ask the question as to whether deep learning is better than traditional approaches in tag recommendation task for software information sites. Method: In this paper we test this question by applying both the latest deep learning approaches and some traditional approaches on tag recommendation task for software information sites. This is a typical Software En- gineering automation problem where intensive data processing is required to link disparate information to assist developers. Four different deep learning approaches – TagCNN, TagRNN, TagHAN and TagRCNN – are imple- mented and compared with three advanced traditional approaches – EnTagRec, TagMulRec, and FastTagRec. Results: Our comprehensive experimental results show that the performance of these different deep learning approaches varies significantly. The performance of TagRNN and TagHAN approaches are worse than traditional approaches in tag recommendation tasks. The performance of TagCNN and TagRCNN approaches are better than traditional approaches in tag recommendation tasks. Conclusion: Therefore, using appropriate deep learning approaches can indeed achieve better performance than traditional approaches in tag recommendation tasks for software information sites. 1. Introduction Deep learning has been proven to be effective for some software en- gineering tasks, such as API embedding representation [1], modeling source code [2], code clone detection [3], semantically enhanced soft- ware traceability [4], and predicting semantically linkable knowledge in developer online forums [5]. Various deep learning models have been proposed, including commonly used ones such as Convolutional Neural Networks (CNN) [6], Deep Belief Networks (DBN) [7], Recurrent Neu- ral Networks (RNN) [8], Long Short Term Memory Networks (LSTM) [9], Gated Recurrent Unit Neural Networks (GRU) [10], and Genera- tive Adversarial Networks (GAN) [11]. Each model has its own typical application scenarios and requires different amounts of training time. Thus choosing an appropriate model and balancing efficiency and ef- Corresponding author. E-mail addresses: [email protected] (P. Zhou), [email protected] (J. Liu), [email protected] (X. Liu), [email protected] (Z. Yang), [email protected] (J. Grundy). fectiveness are a key challenge to apply deep learning successfully for a software engineering problem [2,5,12]. Nevertheless, given the huge increase in recent papers on the topic, there are increasing numbers of researchers who seem to believe that deep learning is superior to more traditional techniques for data analysis and even regard it as perhaps one of Brooks’ ”silver bullets” for solving Software Engineering prob- lems [13]. In a recent ASE’16 paper [5], Xu et al. applied CNN to predict seman- tically linkable knowledge in developer online forums. A question and its answers in a software information site such as StackOverflow can be considered as a knowledge unit. Two knowledge units are link- able if they are semantically related. Since the problem of predicting semantically linkable knowledge units can be reduced to a multiclass classification problem, the deep learning model CNN is a good choice https://doi.org/10.1016/j.infsof.2019.01.002 Received 17 January 2018; Received in revised form 8 January 2019; Accepted 9 January 2019 Available online 10 January 2019 0950-5849/© 2019 Elsevier B.V. All rights reserved.
Transcript
Page 1: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

Information and Software Technology 109 (2019) 1–13

Contents lists available at ScienceDirect

Information and Software Technology

journal homepage: www.elsevier.com/locate/infsof

Is deep learning better than traditional approaches in tag recommendation

for software information sites?

Pingyi Zhou

a , Jin Liu

a , b , ∗ , Xiao Liu

c , Zijiang Yang

d , John Grundy

e

a School of Computer Science, Wuhan University, Wuhan, China b Key Laboratory of Network Assessment Technology, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China c School of Information Technology, Deakin University, Geelong, Australia d Department of Computer Science, Western Michigan University, Kalamazoo, Michigan, USA e Faculty of Information Technology, Monash University, Melbourne, Australia

a r t i c l e i n f o

Keywords:

Deep learning

Data analysis

Tag recommendation

Software information site

Software object

a b s t r a c t

Context: Inspired by the success of deep learning in other domains, this new technique been gaining widespread

recent interest in being applied to diverse data analysis problems in software engineering. Many deep learning

models, such as CNN, DBN, RNN, LSTM and GAN, have been proposed and recently applied to software engineer-

ing tasks including effort estimation, vulnerability analysis, code clone detection, test case selection, requirements

analysis and many others. However, there is a perception that applying deep learning is a ”silver bullet ” if it can

be applied to a software engineering data analysis problem.

Object: This motivated us to ask the question as to whether deep learning is better than traditional approaches

in tag recommendation task for software information sites.

Method: In this paper we test this question by applying both the latest deep learning approaches and some

traditional approaches on tag recommendation task for software information sites. This is a typical Software En-

gineering automation problem where intensive data processing is required to link disparate information to assist

developers. Four different deep learning approaches – TagCNN, TagRNN, TagHAN and TagRCNN – are imple-

mented and compared with three advanced traditional approaches – EnTagRec, TagMulRec, and FastTagRec.

Results: Our comprehensive experimental results show that the performance of these different deep learning

approaches varies significantly. The performance of TagRNN and TagHAN approaches are worse than traditional

approaches in tag recommendation tasks. The performance of TagCNN and TagRCNN approaches are better than

traditional approaches in tag recommendation tasks.

Conclusion: Therefore, using appropriate deep learning approaches can indeed achieve better performance than

traditional approaches in tag recommendation tasks for software information sites.

1

g

s

w

i

p

N

r

[

t

a

T

j

f

a

i

r

t

o

l

t

i

c

a

s

c

h

R

A

0

. Introduction

Deep learning has been proven to be effective for some software en-ineering tasks, such as API embedding representation [1] , modelingource code [2] , code clone detection [3] , semantically enhanced soft-are traceability [4] , and predicting semantically linkable knowledge

n developer online forums [5] . Various deep learning models have beenroposed, including commonly used ones such as Convolutional Neuraletworks (CNN) [6] , Deep Belief Networks (DBN) [7] , Recurrent Neu-

al Networks (RNN) [8] , Long Short Term Memory Networks (LSTM)9] , Gated Recurrent Unit Neural Networks (GRU) [10] , and Genera-ive Adversarial Networks (GAN) [11] . Each model has its own typicalpplication scenarios and requires different amounts of training time.hus choosing an appropriate model and balancing efficiency and ef-

∗ Corresponding author.

E-mail addresses: [email protected] (P. Zhou), [email protected] (J.

[email protected] (J. Grundy).

ttps://doi.org/10.1016/j.infsof.2019.01.002

eceived 17 January 2018; Received in revised form 8 January 2019; Accepted 9 Jan

vailable online 10 January 2019

950-5849/© 2019 Elsevier B.V. All rights reserved.

ectiveness are a key challenge to apply deep learning successfully for software engineering problem [2,5,12] . Nevertheless, given the hugencrease in recent papers on the topic, there are increasing numbers ofesearchers who seem to believe that deep learning is superior to moreraditional techniques for data analysis and even regard it as perhapsne of Brooks’ ”silver bullets ” for solving Software Engineering prob-ems [13] .

In a recent ASE’16 paper [5] , Xu et al. applied CNN to predict seman-ically linkable knowledge in developer online forums. A question andts answers in a software information site such as StackOverflowan be considered as a knowledge unit. Two knowledge units are link-ble if they are semantically related. Since the problem of predictingemantically linkable knowledge units can be reduced to a multiclasslassification problem, the deep learning model CNN is a good choice

Liu), [email protected] (X. Liu), [email protected] (Z. Yang),

uary 2019

Page 2: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

a

p

c

h

S

t

s

b

i

t

I

m

E

f

v

t

s

(

m

t

g

m

e

t

p

d

s

r

p

S

o

s

a

c

T

N

a

N

m

t

a

o

f

b

t

w

t

w

d

T

t

b

f

d

o

s

l

w

t

p

2

2

m

s

c

S

s

c

a

a

d

p

t

a

t

f

v

p

i

l

t

t

o

d

2

t

b

s

m

s

nd indeed it achieves good performance. However, in an FSE’17 pa-er [12] , Fu and Menzies repeated the study using support vector ma-hine (SVM) with differential evolution (DE). DE is a more traditionalyper-parameter optimization method. The empirical study showed thatVM with DE method achieves similar performance with CNN, but itsraining time is 84 times faster than that of CNN. Based on their re-ult, they concluded that traditional methods can be as good as, or evenetter than, the best current deep learning approaches for some datantensive software engineering problems.

However, a single case study is not general enough to determinehe pros and cons of applying deep learning in software engineering.n this paper we conduct further comparative evaluation on tag recom-endation for a software information site which is a typical Softwarengineering problem requiring intensive data processing. Software in-ormation sites [14–16] offer indispensable platforms for software de-elopers to search solutions, share experience, offer help and learn newechniques [17–21] . The contents posted on these software informationites, such as a question with answers in a developer Q&A communitye.g. StackOverflow 1 ) and a project in an open source software com-unity (e.g. GitHub 2 ), are regarded as software objects [14,15] . As

he software information sites evolve, the number of software objectsrows significantly. To facilitate search and classification, it is a com-on practice for software developers to add tags. High quality tags are

xpected to be concise and can describe the most important features ofhe software objects.

Unfortunately tagging is inherently a distributed and uncoordinatedrocess [14] . The quality of tags depends not only on a developer’s un-erstanding of the software object but also on the developer’s languagekills and preferences. As a result, the number of different tags growsapidly along with continuous addition of software objects. For exam-le, there are more than 20 million questions and 46 thousand tags intackOverflow . Among such a large number of different tags, manyf them are different but redundant. it is becoming harder to maintainoftware information sites.

In this paper, we apply 4 different contemporary deep learningpproaches to the tag recommendation task. These approaches arealled TagCNN, TagRNN, TagHAN and TagRCNN respectively. TagCNN,agRNN, TagHAN and TagRCNN are based on Convolutional Neuraletworks (CNN) [22] , Recurrent Neural Networks (RNN) [23] , Hier-rchical Attention Networks (HAN) [24] , and Recurrent Convolutionaleural Networks (RCNN) [25] , respectively. These four deep learningodels have been widely used for many disparate text classification

asks. Three traditional approaches – EnTagRec [15] , TagMulRec [26] ,nd FastTagRec [27] that have previously been proposed for tag rec-mmendation tasks – are used as comparison approaches. Our researchocuses on whether there is a deep learning approach that can achieveetter performance than traditional approaches for tag recommenda-ion for software information sites. Through our extensive experimentse demonstrate a valid comparison between deep learning and tradi-

ional approaches in tag recommendation tasks. Our initial hypothesisas that traditional approaches have been well tuned for tag recommen-ation and thus should be able to beat the four deep learning methods.o our surprise, our empirical study shows that we actually have foundwo better approaches than traditional approaches.

The main contributions of this paper include:

• With the growing trend towards applying deep learning techniquesin software engineering, we conduct an empirical study to examinewhether deep learning is really superior to traditional approaches fora representative data intensive software engineering task – tag rec-ommendation for software information sites. Our experiments on abroad range of datasets containing a total of 11,352,714 software ob-jects demonstrate the effectiveness of some deep learning methods.

1 http://www.stackoverflow.com

2 http://github.com

s

p

l

w

2

The result shows that appropriate deep learning methods can indeedoutperform well-tuned traditional methods in tag recommendationfor software information sites.

• We tackle the problem of automated tag recommendation in largesoftware information sites. We propose 4 different new deep learningmethods for this problem – TagCNN, TagRNN, TagHAN and TagR-CNN – and compare their efficiency and effectiveness against threetraditional approaches – EnTagRec, TagMulRec, and FastTagRec.Our comprehensive experiments demonstrate that our deep learn-ing methods TagCNN and TagRCNN are better than these traditionalapproaches.

• We identify some guidelines for other software engineering tool de-velopers from our experiences in the tag recommendation task thatwe help will be useful in developing their own deep learning-basedsolutions for data intensive software engineering tasks.

The rest of this paper is organized as follows: Section 2 presents theackground and related works on tag recommendation for software in-ormation site and deep learning. Section 3 explains in detail the foureep learning methods. Section 4 describes the experimental settingsf our study, including research questions, datasets, experimental de-ign, and evaluation measures, and also the experimental results andimitations. In Section 5 , we share some of the important lessons thate learned in implementing these deep learning methods, and discuss

hreats to the validity of our study. Finally Section 6 concludes the pa-er.

. Background and related work

.1. Tag recommendation for software information sites

Tags provide a type of metadata to search, describe, identify, book-ark, classify, and organize software objects in software information

ites [16] . They are widely used in the developer Q&A and open sourceommunities. A software object in developer Q&A community, such astackOverflow , includes title, body, tags, comments and so on. Aoftware object in open source community, such as Freecode , in-ludes software project name, project description, tags and so on. When developer posts a question in a Q&A community or shares a project inn open source community, software information sites usually requireevelopers to classify their contents with multiple tags at the time ofosting. For example, StackOverflow suggests that developers at-ach at least three but no more than five tags per posting. Freecodellows developers to create more than ten tags for each posting.

Since software developers are free to choose tags, the words used forags are arbitrary. Tags that are intended to represent the same meaningrequently use different words such as spaces vs. no spaces, upper casess. lower cases, acronym vs. full spelling, hyphens vs. no hyphens. Suchhenomenon makes it difficult for software developers to search for ex-sting tags, thus become more likely to use their own wording, whicheads to even more synonymous tags with different spelling. Automatedag recommendation can alleviate the problem by recommending tagshat have been used for semantically similar software objects.

Tag recommendation has been a popular research topic in the fieldsf social network and data mining [28–32] . Automatic tag recommen-ation in software engineering was first proposed by Al-Kofahi et al. in010 [16] . Al-Kofahi et al. proposed a method called TAGREC to au-omatically recommend tags for work items in IBM Jazz. TAGREC wasased on the fuzzy set theory and considered the dynamic evolution of aystem. Later a method called TAGCOMBINE [14] was proposed to auto-atically recommend tags for software objects in software information

ites. It consists of three components: a multi-label ranking component, aimilarity based ranking component, and a tag-term based ranking com-onent. The multi-label ranking approach adopted by TAGCOMBINEimits its application to relatively small datasets. For a large-scale soft-are information site such as StackOverflow , TAGCOMBINE has to

Page 3: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

t

e

C

o

f

a

t

t

a

(

l

T

I

d

l

t

T

t

o

w

l

t

w

t

t

o

c

2

l

f

t

B

f

e

w

t

t

t

t

c

t

f

w

t

f

p

d

(

t

t

l

F

n

o

t

m

m

t

i

b

r

r

t

m

L4

d

j

e

2

p

d

m

a

e

h

a

d

m

D

w

u

w

A

m

g

a

3

t

T

3

o

a

o

n

n

p

a

si

t

t

t

3

f

C

fi

t

t

rain more than forty thousand binary classifier models and the size ofach training set has more than ten million objects.

A more recent approach called EnTagRec [15] outperforms TAG-OMBINE in terms of Recall and Precision metrics. EnTagRec consistsf two components: Bayesian inference component and Frequentist in-erence component. However, EnTagRec is not scalable as well, as itlso utilizes all information in software information sites to recommendags for a software object. Lately, a state-of-the-art tags recommenda-ion method TagMulRec was proposed by Zhou et al. in 2017 [26] . For given software object, TagMulRec prunes the large-scale categoriestags) into a much smaller set of target category candidates for simi-arity distance computation. In addition, neither TAGCOMBINE nor En-agRec adapts to the dynamic evolution of software information sites.n contrast, TagMulRec is scalable and is able to handle continuous up-ates in the software information sites. However, TagMulRec only uti-izes a small portion of software information sites that is most relevanto a given software object. Recently, an advanced method called Fast-agRec [27] outperforms TagMulRec in terms of efficiency and effec-iveness. FastTagRec exploits a neural network approach that is basedn single-hidden layer neural network and the rank of constraint ofords. In order to avoid the limitation of TagMulRec, FastTagRec uti-

izes shared parameters among features and tags to utilize all informa-ion in software information sites.

In this paper, we only recommend tags which already exist in soft-are information sites. All of the methods used to recommend tags in

his paper are based only on the textual content of the software object,hus avoiding the “cold-start problem ” [33] and “content-based tag rec-mmendation ” core issues in the machine learning community. We willonsider these issues in our future works.

.2. Deep learning

Deep learning is an increasingly popular technique from machineearning building upon the concept of artificial neural networks. Feed-orward neural networks represent a traditional neural network struc-ure and lay the foundation for deep learning model structures [34] .ased on feedforward neural networks, deep learning model structureseature more complex network connections in a larger number of lay-rs. However, the number of parameters in a fully connected feedfor-ard neural network grows extremely large as the width and depth of

he network increase. To address this limitation, researchers and indus-rial practitioners have proposed various deep learning model structuresargeting different types of practical problems. For example, convolu-ional neural networks (CNN) have been the dominant approach foromputer vision (CV) tasks [35] . For natural language process (NLP)asks, recurrent neural networks (RNN) are particularly well suitedor processing sequential text data [36] . Hierarchical attention net-orks (HAN) [24] can select qualitatively informative words and sen-

ences based on hierarchical structure (words form sentences, sentencesorm a document) in text classification task. However, RNN aims tout more weight on recent information, where later words are moreominant than earlier ones. Recurrent convolutional neural networkRCNN) [25] is proposed to address this limitation for text classificationasks. However, all of the above models depend on word vector models,hat can translate a word into a numerical vector to provide input trans-ation. The widely-used word vector methods include Word2Vec [37] ,astText [38] and Glove [39] . Word2Vec is a shallow with two-layereural networks model architecture which includes the continuous bag-f-words (CBOW) architecture and the continuous skip-gram architec-ure to produce a distributed representation of words. The FastTextethod is based on the CBOW model of the Word2Vec method. Gloveethod utilizes global word-word co-occurrence statistics from a corpus

o produce a vector space with meaningful substructure. Deep learning has made great progress in natural language process-

ng, machine vision and other fields. Recently, deep learning has alsoeen demonstrated in its effectiveness when applied to various types of

3

ecommender systems. These include video recommendation [40] , Appecommendation [41] , news recommendations [42] , etc. Many differentypes of deep learning techniques (CNN, RNN, HAN, Restricted Boltz-ann Machine, Generative Adversarial Networks, Deep Reinforcement

earning, and etc) are applied to these recommendation systems [43–9] . For comparison purpose, in this paper, we have applied populareep learning models (CNN, RNN, HAN and RCNN) to the software ob-ect tag recommendation in software information sites. Comprehensivexperimental results have been demonstrated for comparison purposes.

.3. Deep learning in software engineering

Such state-of-the-art deep learning algorithms have recently seededromising new methods in the software engineering field [1–5] . Sinceeep learning techniques have been successfully applied in various do-ains such as computer vision and natural language processing, great

ttention has been attracted from researchers and industrial practition-rs in software engineering [5,26,50–52] . Various deep learning modelsave been applied to solve different software engineering tasks, suchs API embedding representation, modeling source code, code cloneetection, semantically enhanced software traceability, predicting se-antically linkable knowledge in developer online forums and so on.eep learning models play two different categories of roles in these soft-are engineering tasks. In the first category, deep learning models aresed as a pre-processor. For example, Nguyen et al. proposed utilizingord2vec, a kind of deep learning model, to represent API as vectors forPI usages and applications [1] . In the second category, deep learningodels are used as a solution. For example, Xu et al. adopt neural lan-

uage model and convolutional neural networks (CNN) to capture wordnd document-level semantics of knowledge units [5] .

. Our approach

In this section, we first formally present the tag recommenda-ion problem. We then propose four deep learning methods–TagCNN,agRNN, TagHAN and TagRCNN– to solve the same problem.

.1. Problem formulation

A software information site is a set 𝑆 = { 𝑜 1 , … , 𝑜 𝑛 } , where i (1 ≤ i ≤ n ) denotes a software object. For a developer Q&A site, thettributes of o i consist of an identifier id , a body b , a title tt , and a setf tags T . For an open source site, the attribute of o i consists of projectame na , project description d , a set of tags T . If we treat the combi-ation of the title tt and body b of a software object in a Q&A site as aroject description d . We can assume that any software object o i contain description d and a set of tags T . These tags in a software informationite S is a set = { 𝑡 1 , … , 𝑡 𝑚 } and the tags associated with an object o i s a subset of . The research question in tag recommendation task ishe following: given a large set of existing software objects that are at-ached with tags, how to automatically recommend a set of appropriateags for a new software object o i .

.2. TagCNN

We firstly propose a new deep learning-based TagCNN methodor tag recommendation, as depicted in Fig. 1 . TagCNN is based onNN [53,54] , a technique that has been proven successful for text classi-cation domains elsewhere. This suggests it could be a good choice forhe tag recommendation domain. The major steps of TagCNN includehe following:

1. Given a software object o i , let 𝑥 𝑖 ∈ ℝ

𝑑𝑖𝑚 be the dim -dimensional wordvector corresponding to the i th word in the description o i .d . A de-scription of length n is represented as

𝑥 = 𝑥 ⊕ 𝑥 ⊕⋯ ⊕ 𝑥 , (1)

1∶ 𝑛 1 2 𝑛
Page 4: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

Fig. 1. The overall architecture of TagCNN.

where ⊕ is the concatenation operator, and 𝑥 𝑖 ∶ 𝑖 + 𝑗 refers to theconcatenation of words 𝑥 𝑖 , 𝑥 𝑖 +1 , ⋯ , 𝑥 𝑖 + 𝑗 . It can be represented bya n ∗ dim matrix in Fig. 1 . These word vectors were trained byMikolov’method [37] .

2. A convolution operation involves a filter 𝑓 ∈ ℝ

ℎ ⋅𝑑𝑖𝑚 that is applied toa region of h words to produce a new feature. For example, a featurec i is generated from a region of words 𝑥 𝑖 ∶ 𝑖 + ℎ −1 by

𝑐 𝑖 = 𝑡𝑎𝑛ℎ ( 𝑓 ⋅ 𝑥 𝑖 ∶ 𝑖 + ℎ −1 + 𝑏 ) . (2)

Here 𝑏 ∈ ℝ is a bias term and tanh is a non-linear hyperbolic tangentfunction. If h is bigger than n , we apply a matrix zero padding of size( ℎ − 𝑛 ) ∗ 𝑑𝑖𝑚 to input layer.

𝑐 = 𝑡𝑎𝑛ℎ ( 𝑓 ⋅ ( 𝑥 1∶ 𝑛 ⊕ 𝑧𝑒𝑟𝑜 (( ℎ − 𝑛 ) ∗ 𝑑𝑖𝑚 )) + 𝑏 ) . (3)

Each possible region of words in the description{𝑥 1∶ ℎ , 𝑥 2∶ ℎ +1 , ⋯ , 𝑥 𝑛 − ℎ +1∶ 𝑛

}apply this filter to produce a feature

4

map

𝑐 =

[𝑐 1 , 𝑐 2 , ⋯ , 𝑐 𝑛 − ℎ +1

], (4)

with 𝑐 ∈ ℝ

𝑛 − ℎ +1 . 3. TagCNN applies a 1-max-pooling operation over the feature map and

take the maximum value 𝑐 = 𝑚𝑎𝑥 { 𝑐 } as the feature corresponding tothis particular filter. The first step is to capture the most importantfeature for each feature map. The pooling process naturally dealswith variable description lengths. TagCNN uses multiple filters withdifferent region sizes to get multiple features. These features formthe penultimate layer 𝑧 =

[𝑐 1 , 𝑐 2 , ⋯ , 𝑐 𝑚

].

4. Loss function and activation function play different roles in deeplearning network structure. Activation function provides the nonlin-ear modeling ability of the network. Loss function is used to measurethe prediction quality of the network model. Sigmoid function mapsa real value to an interval of (0, 1), so that it can be used in outputlayer for two classifications. Softmax function is a generalization of

Page 5: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

Fig. 2. The overall architecture of TagRNN.

3

n

p

t

d

d

3

H

f

a

w

Fig. 3. The overall architecture of TagHAN.

Fig. 4. The structure of GRU.

s

p

3

b

G

T

g

t

T

a

p

logistic function that squashes (maps) a T -dimensional vector z ofarbitrary real values to a T -dimensional vector 𝜎( z ) of real values inthe range (0, 1) that add up to 1. So, softmax function can be used formulti-classification task according the size of | T |. Since the numberof tags is greater than 2, softmax layer is used as the output layer inour paper. The penultimate layer z is passed to the fully connectedsoftmax layer. TagCNN uses the softmax function to compute theprobability distribution over tags. In probability theory, the outputof the softmax layer is used to represent a probability distributionover different possible tags [55] .

𝑃 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( 𝑊 ⋅ 𝑧 + 𝐵) . (5)

Here W is the weight vector and B is a bias term in the fully connectedlayer. Each dimension of 𝑃 ∈ ℝ

represents the probability that thecorresponding tag is recommended to the software object. The k tagswith highest probability ranking are recommended to the softwareobject o i .

.3. TagRNN

We now propose a new TagRNN method that is based on RNN tech-iques. Recurrent Neural Networks (RNN) [23] are one of the mostopular architectures used for NLP-based problems. Again, this suggestshey could be a good choice of technique for the tag recommendationomain. Fig. 2 shows the overall TagRNN structure for tag recommen-ation, which involves the following major steps.

1. The description o i .d is a text sequence { x 1 , x 2 , ⋅⋅⋅, x n } in the givensoftware object o i . n is the length of o i .d and 𝑥 𝑖 ∈ ℝ

𝑑𝑖𝑚 is the dim -dimensional word vector corresponding to the i th word in o i .d . Wecan get the vector representation (embeddings) x i of the i th word bythe pre-trained word vectors [37] .

2. TagRNN is able to process a text sequence of arbitrary length byrecursively applying a transition function to its internal hidden statevector ℎ 𝑖 ∈ ℝ

𝐻 of the input sequence. The activation of hidden stateh i at the i th step is computed as a rectified linear function Relu [56] ofthe current input x i and the previous hidden state ℎ 𝑖 −1

ℎ 𝑖 =

{

0 𝑖 = 0 𝑅𝑒𝑙𝑢 ( 𝑈 ⋅ ℎ 𝑖 −1 + 𝑉 ⋅ 𝑥 𝑖 + 𝑏 ) 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(6)

Here 𝑉 ∈ ℝ

𝐻∗ 𝑑𝑖𝑚 is weight vector, 𝑈 ∈ ℝ

𝐻∗ 𝐻 is a weight parameterand 𝑏 ∈ ℝ

𝐻 is a bias term. The output at the last step h n can beregarded as the representation the description o i .d .

3. TagRNN has a fully connected layer followed by a softmax non-linearlayer that predicts the probability distribution over tags.

𝑃 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( 𝑊 ⋅ ℎ 𝑛 + 𝐵) . (7)

The k tags with highest probability ranking are recommended to thesoftware object o i .

.4. TagHAN

We propose a new TagHAN method that is based on HAN models.ierarchical attention networks (HAN) [24] can select qualitatively in-

ormative words and sentences in an NLP analysis task [24] . The overallrchitecture of TagHAN is given in Fig. 3 . It contains several parts: aord encoder, a word-level attention layer, a sentence encoder and a

5

entence-level attention layer. We describe the details of different com-onents in the following.

.4.1. GRU-based sequence encoder

We first introduce the gated recurrent unit (GRU) [57] that is theasic unit in word and sentence encoder. Fig. 4 shows the structure ofRU. GRU uses a gating mechanism to track the state of sequences.here are two types of gates: the reset gate 𝑟 𝑖 ( 𝑟 𝑖 ∈ ℝ

𝐻 ) and the updateate 𝑧 𝑖 ( 𝑧 𝑖 ∈ ℝ

𝐻 ) . They together control how information is updated tohe state. At the i th word or sentence, GRU computes the new state as

𝑖 =

(1 − 𝑧 𝑖

)⊙ ℎ 𝑖 −1 + 𝑧 𝑖 ⊙ ℎ 𝑖 . (8)

his is a linear interpolation between the previous state ℎ 𝑖 −1 ( ℎ 𝑖 −1 ∈ ℝ

𝐻 )nd the current new state ℎ 𝑖 ( ℎ 𝑖 ∈ ℝ

𝐻 ) . ⊙ denotes element-wise multi-lication. The gate z decides how much past information is kept and

i
Page 6: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

h

𝑧

𝜎

w

ti

H

c

𝑈

z

a

𝑟

H

r

3

s

w

v

p

i

v

w

b

s

(

W

w

s

t

t

a

s

𝑢

𝑎

𝑠

T

p

m

lt

s

s

W

r

t

t

𝑢

𝑎

𝑣

H

a

3

w

T

l

𝑃

T

s

3

t

i

R

a

i

o

c

o

o

f

i r i

ow much new information is added. z i is updated by:

𝑖 = 𝜎(𝑊 𝑧 𝑥 𝑖 + 𝑈 𝑧 ℎ 𝑖 −1 + 𝑏 𝑧

). (9)

is the sigmoid activation function [58] , here 𝑊 𝑧 ( 𝑊 𝑧 ∈ ℝ

𝐻∗ 𝑑𝑖𝑚 ) iseight vector, 𝑈 𝑧 ( 𝑈 𝑧 ∈ ℝ

𝐻∗ 𝐻 ) is a weight parameter, 𝑏 𝑧 ∈ ℝ

𝐻 is a biaserm and 𝑥 𝑖 ( 𝑥 𝑖 ∈ ℝ

𝑑𝑖𝑚 ) is the input word vector. The candidate state ℎ 𝑖 s computed in a way similar to a RNN:

𝑖 = 𝑡𝑎𝑛ℎ

(𝑊 ℎ 𝑥 𝑖 + 𝑟 𝑖 ⊙

(𝑈 ℎ ℎ 𝑖 −1

)+ 𝑏 ℎ

), (10)

ere r i is the reset gate which controls how much the past stateontributes to the candidate state, 𝑊 ℎ ( 𝑊 ℎ ∈ ℝ

𝐻∗ 𝑑𝑖𝑚 ) is weight vector, ℎ ( 𝑈 ℎ ∈ ℝ

𝐻∗ 𝐻 ) is a weight parameter, 𝑏 ℎ ∈ ℝ

𝐻 is a bias term. If r i isero vectors, then it forgets the previous state. The reset gate is updateds follows:

𝑖 = 𝜎(𝑊 𝑟 𝑥 𝑖 + 𝑈 𝑟 ℎ 𝑖 −1 + 𝑏 𝑟

). (11)

ere 𝑊 𝑟 ( 𝑊 𝑟 ∈ ℝ

𝐻∗ 𝑑𝑖𝑚 ) is weight vector, 𝑈 𝑟 ( 𝑈 𝑟 ∈ ℝ

𝐻∗ 𝐻 ) is a weight pa-ameter, 𝑏 𝑟 ∈ ℝ

𝐻 is a bias term.

.4.2. Hierarchical attention

For a software object o , we assume that the description o.d has Lentences s i and each sentence contains m i words. x ip represents the p thord’s vector corresponding to the i th sentence in o.d . We can get theector representation x ip by the pre-trained word vectors [37] . TagHANrojects the description o.d into a vector representation. In the follow-ng, we will present how we build the vector progressively from wordectors by using the hierarchical structure.

Word encoder. We use a bidirectional GRU to get annotations ofords by summarizing information from both directions for words. Theidirectional GRU contains the forward GRU which reads the sentence i from x i 1 to x im

and a backward GRU which reads from x im

to x i 1 :

𝑖𝑝 =

𝐺𝑅𝑈 ( 𝑥 𝑖𝑝 ) , 𝑝 ∈ [ 1 , 𝑚 ] , (12)

ℎ ′𝑖𝑝 =

𝐺𝑅𝑈 ( 𝑥 𝑖𝑝 ) , 𝑝 ∈ [ 1 , 𝑚 ] . (13)

e obtain an annotation for a given word x ip by concatenating the for-

ard hidden state h ip and back hidden state ℎ ′𝑖𝑝 , i.e.,

[ℎ 𝑖𝑝 , ℎ

′𝑖𝑝

], which

ummarizes the information of the whole sentence centered around x ip .Word attention. Not all words contribute equally to the representa-

ion of the sentence meaning. Hence, we introduce attention mechanismo extract such words that are important to the meaning of the sentencend aggregate the representation of those informative words to form aentence vector. Specifically:

𝑖𝑝 = 𝑡𝑎𝑛ℎ (𝑊 𝑤

[ℎ 𝑖𝑝 , ℎ

′𝑖𝑝

]+ 𝑏 𝑤

), (14)

𝑖𝑝 =

𝑒𝑥𝑝 ( 𝑢 ⊤𝑖𝑝 𝑢 𝑤 ) ∑

𝑘 𝑒𝑥𝑝 ( 𝑢 ⊤𝑖𝑝 𝑢 𝑤 ) , (15)

𝑖 =

∑𝑝

𝑎 𝑖𝑝

[ℎ 𝑖𝑝 , ℎ

′𝑖𝑝

]. (16)

hat is, we first feed the word annotation [ℎ 𝑖𝑝 , ℎ

′𝑖𝑝

]through one-layer

erception to get u ip as a hidden representation of [ℎ 𝑖𝑝 , ℎ

′𝑖𝑝

], then we

easure the importance of the word as the similarity of u ip with a wordevel context vector u w and get a normalized importance weight a ip hrough a softmax function. After that, we compute the sentence vector i as a weight sum of the word annotations based on the weights.

Sentence encoder. For a description o.d , we can get the vector in aimilar way. We use a bidirectional GRU to encode the sentence:

𝑖 =

𝐺𝑅𝑈 ( 𝑠 𝑖 ) , 𝑖 ∈ [ 1 , 𝐿 ] , (17)

′ =

𝐺𝑅𝑈 ( 𝑠 𝑖 ) , 𝑖 ∈ [ 𝐿, 1 ] . (18)

𝑖

6

e concatenate h i and ℎ ′𝑖 to get an annotation of sentence i , i.e.,

[ℎ 𝑖 , ℎ

′𝑖

].

Sentence attention. To reward sentences that are important to cor-ectly tag a software object, we again use attention mechanism and in-roduce as sentence level context vector u s and use the vector to measurehe importance of the sentences. Specifically:

𝑖 = 𝑡𝑎𝑛ℎ (𝑊 𝑠

[ℎ 𝑖 , ℎ

′𝑖

]+ 𝑏 𝑠

), (19)

𝑖 =

𝑒𝑥𝑝 ( 𝑢 ⊤𝑖 𝑢 𝑠 ) ∑

𝑖 𝑒𝑥𝑝 ( 𝑢 ⊤𝑖 𝑢 𝑠 ) , (20)

=

∑𝑖

𝑎 𝑖 [ℎ 𝑖 , ℎ

′𝑖

]. (21)

ere, v is the vector that summarizes all the information of sentences in description of software object.

.4.3. Tag recommendation

The vector v is a high level representation of a description of soft-are object and can be used as features for tag recommendation. Finally,agHAN has a fully connected layer followed by a softmax non-linear

ayer that predicts the probability distribution over tags.

= 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 ( 𝑊 𝑐 𝑣 + 𝑏 𝑐 ) . (22)

he k tags with highest probability ranking are recommended to theoftware object o .

.5. TagRCNN

Finally, we propose TagRCNN, a new tag recommendation methodhat is based on RCNN. As RNN aims to put more weight on recentnformation, where later words are more dominant than earlier ones, aecurrent Convolutional Neural Network (RCNN) [25] can be used toddress this limitation. The overall architecture of TagRCNN is shownn Fig. 5 . The input of TagRCNN is the description d of a software object , which is a sequence of words w 1 , w 2 , ⋅⋅⋅, w n . The output of TagRCNNontains tags. We use p ( t | o, 𝜃) to denote the probability of the softwarebject being labeled tag t , where 𝜃 is the parameters in the model and is a given software object. We describe the details of each step in theollowing.

1. TagRCNN uses a recurrent structure, a bi-directional recurrent neu-ral network, to capture the contexts of a word. TagRCNN combinesa word and its context to represent a word. We define 𝑐 𝑙 ( 𝑤 𝑖 )( 𝑐 𝑙 ( 𝑤 𝑖 ) ∈ℝ

|𝑐|) as the left context of word w i and 𝑐 𝑟 ( 𝑤 𝑖 )( 𝑐 𝑟 ( 𝑤 𝑖 ) ∈ ℝ

|𝑐|) as theright context of word w i . Both c l ( w i ) and c r ( w i ) are dense vector with| c | float value elements. In the experiments of this paper, | c | is setto 50. The left context c l ( w i ) of word w i is calculated by Eq. (23) ,where 𝑒 ( 𝑤 𝑖 −1 )( 𝑒 ( 𝑤 𝑖 −1 ) ∈ ℝ

|𝑒 |) is the word embedding of word 𝑤 𝑖 −1 ,

which is a dense vector with | e | float value elements. 𝑐 𝑙 ( 𝑤 𝑖 −1 ) is theleft context of the previous word 𝑤 𝑖 −1 . The left context for the firstword in any description uses the same shared parameters c l ( w 1 ).𝑊

𝑙 ( 𝑊

𝑙 ∈ ℝ

|𝑐 |×|𝑐 |) is a matrix that transforms the hidden layer intonext hidden layer and 𝑊

𝑠𝑙 ( 𝑊

𝑠𝑙 ∈ ℝ

|𝑐|×|𝑒 |) is a matrix that is used tocombine the semantic of the current word with the next word’s con-text. f is a non-linear activation function. The right context c r ( w i ) isget in a similar manner, as shown in Eq. (24) . The right contexts ofthe last word in a description share the parameters c r ( w n )

𝑐 𝑙 ( 𝑤 𝑖 ) = 𝑓 ( 𝑊

𝑙 𝑐 𝑙 ( 𝑤 𝑖 −1 ) + 𝑊

𝑠𝑙 𝑒 ( 𝑤 𝑖 −1 )) . (23)

𝑐 𝑟 ( 𝑤 𝑖 ) = 𝑓 ( 𝑊

𝑟 𝑐 𝑟 ( 𝑤 𝑖 +1 ) + 𝑊

𝑠𝑟 𝑒 ( 𝑤 𝑖 +1 )) . (24)

Here 𝑊

𝑟 ( 𝑊

𝑟 ∈ ℝ

|𝑐 |×|𝑐 |) and 𝑊

𝑠𝑟 ( 𝑊

𝑠𝑟 ∈ ℝ

|𝑐|×|𝑒 |) are weight matrixes.

2. TagRCNN defines the representation of word w i in Eq. (25) , whichis the concatenation of the left context c l ( w i ), the word embeddinge ( w ) and the right context c ( w ). The recurrent structure can obtain

Page 7: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

Fig. 5. The overall architecture of TagRCNN.

4

b

a

t

4

t

t

m

t

s

W

J

p

t

a

t

i

n

i

c

A

d

T

p

f

o

c

f

s

o

s

t

t

i

c

r

i

all c l in a forward scan of a description and c r in a backward scan ofthe description.

𝑥 𝑖 = [ 𝑐 𝑙 ( 𝑤 𝑖 ) ∶ 𝑒 ( 𝑤 𝑖 ) ∶ 𝑐 𝑟 ( 𝑤 𝑖 )] . (25)

After obtaining the representation 𝑥 𝑖 ( 𝑥 𝑖 ∈ ℝ

|𝑒 |+2 |𝑐|) of the word w i ,TagRCNN applies a linear transformation together with the tanh ac-tivation function to x i and send the result to the next layer.

𝑦 (2) 𝑖

= 𝑡𝑎𝑛ℎ ( 𝑊

(2) 𝑥 𝑖 + 𝑏 (2) ) . (26)

𝑦 (2) 𝑖 ( 𝑦 (2) 𝑖

∈ ℝ

|𝐻|) is the latent semantic vector, in which each seman-tic factor will be analyzed to determine the most useful factor forrepresenting the description. | H | is a hyper-parameter. In our exper-iments of this paper, | H | is set to 100. 𝑊

(2) ( 𝑊

(2) ∈ ℝ

|𝐻|×( |𝑒 |+2 |𝑐|) ) isa weight matrix and 𝑏 (2) ( 𝑏 (2) ∈ ℝ

|𝐻|) is a bias vector. 3. When all of the representation of words are calculated, a max-

pooling layer is applied.

𝑦 (3) 𝑖

=

𝑛 max 𝑖 =1

𝑦 (2) 𝑖 . (27)

The max function is an element-wise function. The q th element of𝑦 (3) 𝑖 ( 𝑦 (3) 𝑖

∈ ℝ

|𝐻|) is the maximum in the q th element of 𝑦 (2) 𝑖

. The pool-ing layer converts descriptions with various lengths into a fixed-length vector. With the pooling layer, TagRCNN captures the in-formation throughout the entire description of software object. Themax-pooling layer attempts to find the most important latent seman-tic factors in a description. The last part of TagRCNN is an outputlayer. Similar to traditional neural networks, it is defined as:

𝑦 (4) = 𝑊

(4) 𝑦 (3) + 𝑏 (4) . (28)

Here 𝑦 (4) ( 𝑦 (4) 𝑖

∈ ℝ

| |) is the output vector. 𝑊

(4) ( 𝑊

(4) ∈ ℝ

| |×|𝐻|)is a weight matrix and 𝑏 (4) ( 𝑏 (4) ∈ ℝ

| |) is a bias vector.

4. TagRCNN applies the softmax function to y (4) to get the probabilitydistribution over tags. The k tags with highest probability rankingare recommended to the software object o .

𝑃 =

𝑒𝑥𝑝 ( 𝑦 (4) 𝑖 ) ∑𝑛

𝑞=1 𝑒𝑥𝑝 ( 𝑦 (4) 𝑞 ) . (29)

. Experimental design

In order to investigate whether any of our proposed deep learning-ased tag recommendation approaches can outperform three traditionalpproaches we conducted an empirical study to answer the followingwo research questions:

7

• RQ1: Which of the four proposed methods – TagCNN, TagRNN,TagHAN and TagRCNN – is better than the three traditional ap-proaches – EnTagRec, TagMulRec, and FastTagRec – for the softwaresite tag recommendation problem?

• RQ2: Is the computational cost of the model training for the pro-posed method acceptable?

.1. Datasets and experiment setup

Our experiments datasets come from ten software informa-ion sites. We evaluated our four deep learning methods andhree traditional approaches on one large-scale software infor-ation site StackOverflow , 3 medium-scale software informa-

ion sites Askubuntu , Serverfault , Unix and 6 small-scaleites Codereview , Freecode , Database Administrator ,ordpress , AskDifferent and Software Engineering .

For StackOverflow , we selected software objects posted beforeuly 1st, 2014, the same date setting as used in [26] to facilitate com-arison in our empirical study. For the other 9 datasets, we consider allhe software objects posted before Dec 31st, 2016. We define a site as large-scale site if the number of software objects in the site is morehen 1 million, as a medium-scale site if the number of software objectsn the site is between 100k to 1 million, and as a small-scale site if theumber of software objects in the site is less than 100k.

We used the same data preprocessing rules as in [26] . First, theres no conversion operation applied to tags other than that all tags areonverted to lowercase. Then we remove rare tags and software objects. tag is rare if its number of appearances is less than or equal to a pre-efined threshold ts . A software object is removed if all its tags are rare.he threshold value ts 50 was used in prior work [14,15,26] . In thisaper, we also set the threshold values of ts to 50 for ten software in-ormation sites. Table 1 summarizes the number of tags and softwarebjects after removing the rare ones under threshold values 1 and 50. Itan be observed from Table 1 that the number of software objects rangesrom about 40K to more than ten million for these software informationites. When the threshold value is changed from 1 to 50, 1.811% of thebjects are removed and 66.004% of the tags are removed from theseites. Second, for these remaining software objects in Table 1 , we fur-her remove code snippets and screen shots form their descriptions. Thewo preprocessing steps were also used in prior work [14,15,26] . Thats, only the text in the description is preserved. After the two prepro-essing steps, these preprocessed datasets are used to train and test tagecommendation models.

In order to be consistent with previous researches [14,15,26,27] ,n our experiments, we also performed a ten-round validation on each

Page 8: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

Table 1

Statistics of the ten datasets on different rare tag threshold values.

Site Name ts #software object #tags

StackOverflow 1 11,203,032 44,265

50 11,193,348 18,952

Askubuntu 1 248630 3041

50 246138 1146

Serverfault 1 232996 3482

50 231319 1312

Unix 1 104744 2407

50 103243 770

Codereview 1 39989 909

50 39811 302

Freecode 1 47,978 9018

50 43,644 274

Database Administrator 1 51031 969

50 50687 293

Wordpress 1 71338 770

50 70491 403

AskDifferent 1 77978 1049

50 77503 469

Software Engineering 1 42,782 1628

50 41,531 418

d

p

t

w

s

t

d

s

t

i

s

o

t

a

p

2

|

|

T

r

m

s

o

w

m

a

t

T

w

6

l

T

t

l

4

t

k

T

c

w

r

j

𝑅

𝑅

t

t

a

c

𝑃

𝑃

T

d

F

t

i

1

s

t

g

t

f

T

d

s

b

𝐹

𝐹

4

a

T

v

M

s

t

ataset to evaluate TagCNN, TagRNN, TagHAN and TagRCNN. For eachreprocessed dataset, we randomly selected 10,000 software objects andreated them as a test set V . The remaining software objects in a datasetere treated as a training set and used to recommend tags for the 10,000

elected ones. For an information site, all text in the training set is usedo train word vectors by Word2Vec [37] . In our experiments, we set theimension of word vector to 300 for large-scale site, 200 for medium-cale sites, and 100 for small-scale sites. For an object o with | T | tags inraining set, we code tag t i using one-hot encoding and get | T | trainingnstances {( o.d, t 1 ), ..., ( o.d, t | T | )}. When a new tag is added, we add theize of the output layer for new tags. As the penultimate layer to theutput layer is the full connection layer, the model only needs to addhese weight vectors associated with new tags.

For the TagCNN method, we set the region sizes to (2, 3, 4, 5),nd 25 filters for each region size. For the TagRNN method, the hyper-arameter | H | is set to 128 for small-scale and medium-scale sites and56 for large-scale site. For the TagHAN method, the hyper-parameter H | of GRU is set to 50. For the TagRCNN method, the hyper-parameter H | is set to 100 and the hyper-parameter | c | is set to 50. For theagCNN, TagHAN and TagRCNN method, in order to reduce memoryequirements and speed up the training time, the max sequence lengthsl is set to 400 for a large-scale site and 600 for medium-scale and

mall-scale sites. If the word sequence length in the description d of anbject o is over the max sequence length msl , we only take the first msl

ords of the description d . For the TagRNN method, we don’t set theax sequence length and use all words in the description of an object

s the input of the model. For each software object o i ∈V , we recommend k tags to form a

ag set 𝑇 𝑅

𝑘 𝑖 . We repeated the process ten times and compare TagCNN,

agRNN, TagHAN and TagRCNN against TagMulRec. All experimentsere conducted on a 64-bit, Intel Core i7 3.6G desktop computer with4G RAM running Ubuntu 16.04. We used the open source softwareibrary Tensorflow 3 to implement our methods TagCNN, TagRNN,agHAN and TagRCNN. These code, parameters and configurations ofhese methods, and all datasets in experiments can be accessed via theink https://pan.baidu.com/s/1pKCpodP .

.2. Evaluation metrics

To be consistent with the experiments reported in [15,26,27] , we usehe top-k prediction recall, the top-k prediction precision, and the top-

3 http://www.tensorflow.org

o

p

8

prediction 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 when evaluating the performance of TagCNN,agRNN, TagHAN and TagRCNN. The Top-k prediction recall is the per-entage of tags selected out of the recommended lists 𝑇 𝑅

𝑘 𝑖

in the soft-are objects true tags. Given a software object o i , the top-k prediction

ecall Recall @ k i is computed by Eq. (30) . Given a set V of software ob-ects, the top-k prediction recall Recall @ k is computed by Eq. (31) .

𝑒𝑐𝑎𝑙 𝑙 @𝑘 𝑖 =

⎧ ⎪ ⎨ ⎪ ⎩ 𝑅𝑒𝑐𝑎𝑙 𝑙 @𝑘 𝑖 =

|𝑇𝑅 𝑘 𝑖 ∩𝑜 𝑖 .𝑇 |𝐾

, |𝑜 𝑖 .𝑇 | > 𝑘. 𝑅𝑒𝑐𝑎𝑙 𝑙 @𝑘 𝑖 =

|𝑇𝑅 𝑘 𝑖 ∩𝑜 𝑖 .𝑇 ||𝑜 𝑖 .𝑇 | , |𝑜 𝑖 .𝑇 | ≤ 𝑘.

(30)

𝑒𝑐𝑎𝑙 𝑙 @𝑘 =

∑|𝑉 |𝑖 =1 𝑅𝑒𝑐𝑎𝑙 𝑙 @𝑘 𝑖 |𝑉 | (31)

The Top-k prediction precision is the percentage of a software objectsags that are in the recommended list 𝑇 𝑅

𝑘 𝑖 . Given a software object o i ,

he top-k prediction precision Precision @ k i is defined by Eq. (32) . Given set V of software objects, the top-k prediction precision Precision @ k isomputed by Eq. (33) .

𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 @𝑘 𝑖 =

|𝑇 𝑅

𝑘 𝑖 ∩ 𝑜 𝑖 .𝑇 |𝑘

(32)

𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 @𝑘 =

∑|𝑉 |𝑖 =1 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 @𝑘 𝑖 |𝑉 | (33)

However, higher precision rate is not the main target in our scenario.he Precision @ k i is inversely proportional to the k value. The k value in-icates the number of tags that we want to recommend to the developer.or example, if a software object has two tags and they are both amonghe top five tags recommended to the developer (namely the k values 5), the precision value is 40%. However, if the k value increases to0, then the precision value will only be 20%. Therefore, lower preci-ion rate in this work does not imply there are many wrong tags buthe number of tags recommended to the developer is larger. Clearly, aood precision rate in our work indicates that a reasonable number ofags has been recommended to the developer, which is much more userriendly than recommending a long list of tags.

The Top-k prediction F1-score combines Top-k prediction recall andop-k prediction precision. Given a software object o i , the Top-k pre-iction F1-score 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 𝑖 is defined by Eq. (34) . Given a set V ofoftware objects, the top-k prediction F1-score 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 is definedy Eq. (35) .

1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 𝑖 = 2 ⋅𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 @𝑘 𝑖 ⋅ 𝑅𝑒𝑐𝑎𝑙 𝑙 @𝑘 𝑖 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 @𝑘 𝑖 + 𝑅𝑒𝑐𝑎𝑙 𝑙 @𝑘 𝑖

(34)

1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 =

∑|𝑉 |𝑖 =1 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 𝑖 |𝑉 | (35)

.3. Experimental results

In this section, we present our experimental results to answer theforementioned research questions.

4.3.1. RQ1: Which of the four proposed methods – TagCNN, TagRNN,

TagHAN and TagRCNN – is better than the three traditional approaches

– EnTagRec, TagMulRec, and FastTagRec, for the software site tag rec-

ommendation problem?

Motivation. Our four approaches (TagCNN, TagRNN, TagHAN andagRCNN) are based on four different deep learning models that areery different from the three traditional approaches EnTagRec, Tag-ulRec, and FastTagRec. The answer to this research question would

hed light on whether any of the chosen contemporary deep learningechniques can improve the performance of our software site tag rec-mmendation task.

Approach. For each preprocessed dataset, we applied our four ap-roaches and the three traditional approaches to train tag recommen-

Page 9: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

Table 2

Four deep learning approaches vs. three traditional approaches in terms of Recall@k (k = 5 and 10).

Site Name Recall@5

TagMulRec EnTagRec FastTagRec TagCNN TagRNN TagHAN TagRCNN

StackOverflow 0.640 – 0.698 0.711 0.672 0.731 0.877

Askubuntu 0.603 – 0.684 0.656 0.603 0.624 0.849

Serverfault 0.622 – 0.666 0.708 0.633 0.650 0.851

Unix 0.604 – 0.627 0.793 0.580 0.617 0.795

Codereview 0.718 0.707 0.758 0.956 0.459 0.571 0.765

Freecode 0.659 0.641 0.588 0.902 0.332 0.374 0.756

Database Administrator 0.666 0.672 0.692 0.958 0.625 0.667 0.805

Wordpress 0.605 0.801 0.632 0.873 0.622 0.646 0.797

AskDifferent 0.708 0.878 0.689 0.947 0.639 0.663 0.838

Software Engineering 0.594 0.564 0.582 0.936 0.439 0.501 0.700

Recall@10

Site Name TagMulRec EnTagRec FastTagRec TagCNN TagRNN TagHAN TagRCNN

StackOverflow 0.749 – 0.774 0.804 0.765 0.822 0.924

Askubuntu 0.721 – 0.770 0.791 0.714 0.737 0.898

Serverfault 0.716 – 0.753 0.828 0.740 0.756 0.896

Unix 0.682 – 0.722 0.899 0.690 0.727 0.848

Codereview 0.788 0.773 0.820 0.980 0.539 0.675 0.814

Freecode 0.758 0.751 0.692 0.949 0.416 0.484 0.816

Database Administrator 0.778 0.794 0.816 0.978 0.742 0.777 0.884

Wordpress 0.725 0.864 0.765 0.945 0.731 0.760 0.863

AskDifferent 0.827 0.940 0.815 0.978 0.741 0.768 0.897

Software Engineering 0.704 0.697 0.708 0.961 0.535 0.599 0.764

Table 3

Four deep learning approaches vs. three traditional approaches in terms of Precision@k (k = 5 and 10).

Site Name Precision@5

TagMulRec EnTagRec FastTagRec TagCNN TagRNN TagHAN TagRCNN

StackOverflow 0.343 – 0.386 0.396 0.305 0.301 0.487

Askubuntu 0.271 – 0.346 0.329 0.311 0.311 0.441

Serverfault 0.305 – 0.344 0.369 0.343 0.335 0.466

Unix 0.294 – 0.309 0.399 0.303 0.302 0.416

Codereview 0.377 0.371 0.398 0.520 0.256 0.281 0.421

Freecode 0.383 0.379 0.343 0.530 0.211 0.218 0.478

Database Administrator 0.313 0.324 0.332 0.479 0.315 0.318 0.404

Wordpress 0.265 0.348 0.278 0.397 0.283 0.285 0.374

AskDifferent 0.372 0.365 0.357 0.506 0.342 0.344 0.450

Software Engineering 0.253 0.245 0.252 0.436 0.217 0.221 0.325

Precision@10

Site Name TagMulRec EnTagRec FastTagRec TagCNN TagRNN TagHAN TagRCNN

StackOverflow 0.205 – 0.217 0.232 0.182 0.171 0.279

Askubuntu 0.166 – 0.198 0.203 0.188 0.187 0.233

Serverfault 0.179 – 0.198 0.222 0.205 0.200 0.245

Unix 0.169 – 0.182 0.231 0.184 0.183 0.222

Codereview 0.211 0.207 0.218 0.268 0.153 0.170 0.224

Freecode 0.245 0.239 0.219 0.297 0.138 0.151 0.258

Database Administrator 0.188 0.194 0.201 0.246 0.190 0.189 0.222

Wordpress 0.163 0.186 0.173 0.219 0.170 0.172 0.203

AskDifferent 0.222 0.201 0.216 0.263 0.202 0.203 0.240

Software Engineering 0.153 0.151 0.157 0.225 0.135 0.136 0.177

d

p

a

s

t

C

g

t

o

h

a

k

a

a

s

[

T

n

s

g

t

T

t

ation models respectively and then used the test set V to evaluate theerformance of these models. We compared the Recall @ k, Precision @ k

nd 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 (k = 5 and 10) metrics of different approaches. Results. Tables 2–4 give the results in terms of Recall @ k, Preci-

ion @ k and 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 (k = 5 and 10) metrics, respectively. In theseables, Column 1 lists the names of the software information sites andolumn 2–4 lists the results of the three traditional approaches EnTa-Rec, TagMulRec, and FastTagRec. For the approach EnTagRec, we listhe experimental results on small-scale sites. Columns 5–8 list the resultsf our four approaches TagCNN, TagRNN, TagHAN and TagRCNN. Theighest scores for each row in these tables are marked in bold.

It can be observed that TagCNN and TagRCNN outperform traditional

pproaches in terms of Recall @ k, Precision @ k , 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 on all the

9

settings in all the ten software information sites. For one large-scalend three medium-scale software information sites, TagRCNN almostchieved the best performance. For six small-scale software informationites, TagCNN achieved the best performance. Wilcoxon signed-rank test59,60] confirms that the performance improvements of TagCNN andagRCNN methods are statistically significant ( p -value < 0.001).

However, the performance of TagRNN and TagHAN approaches are

ot as good as these traditional approaches on some software informationites. For the TagRNN method, we assume that the reason for the lack ofood results is that TagRNN aims to put more weight on recent informa-ion, where later words are more dominant than earlier ones. BecauseagHAN method is based on the hierarchical structure of descriptionext. For the TagHAN method, we assume that the reason for the fail-

Page 10: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

Table 4

Four deep learning approaches vs. three traditional approaches in terms of F1 - score@k (k = 5 and 10).

Site Name F1 - score@5

TagMulRec EnTagRec FastTagRec TagCNN TagRNN TagHAN TagRCNN

StackOverflow 0.444 – 0.476 0.509 0.420 0.426 0.626

Askubuntu 0.374 – 0.437 0.439 0.410 0.415 0.581

Serverfault 0.403 – 0.435 0.485 0.445 0.442 0.602

Unix 0.395 – 0.397 0.531 0.398 0.406 0.546

Codereview 0.494 0.485 0.502 0.674 0.329 0.376 0.543

Freecode 0.485 0.476 0.434 0.668 0.258 0.275 0.586

Database Administrator 0.426 0.438 0.449 0.639 0.419 0.430 0.538

Wordpress 0.368 0.484 0.386 0.546 0.389 0.396 0.509

AskDifferent 0.488 0.514 0.471 0.660 0.446 0.453 0.586

Software Engineering 0.355 0.340 0.352 0.595 0.290 0.307 0.443

F1 - score@10

Site Name TagMulRec EnTagRec FastTagRec TagCNN TagRNN TagHAN TagRCNN

StackOverflow 0.310 – 0.329 0.360 0.294 0.283 0.429

Askubuntu 0.270 – 0.303 0.323 0.297 0.299 0.370

Serverfault 0.287 – 0.304 0.350 0.321 0.316 0.385

Unix 0.271 – 0.282 0.367 0.291 0.292 0.352

Codereview 0.333 0.325 0.335 0.421 0.239 0.271 0.351

Freecode 0.364 0.360 0.332 0.453 0.208 0.230 0.392

Database Administrator 0.201 0.310 0.323 0.393 0.303 0.304 0.355

Wordpress 0.266 0.305 0.283 0.356 0.276 0.280 0.328

AskDifferent 0.350 0.330 0.342 0.415 0.317 0.321 0.378

Software Engineering 0.252 0.248 0.257 0.365 0.216 0.221 0.287

Table 5

The K most popular tags method in terms of Recall @ k, Precision @ k ,

and 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 @𝑘 (K = 5 and 10).

Dataset Recall@5 Precision@5 F1 - score@5

StackOverflow 0.096 0.054 0.067

Dataset Recall@10 Precision@10 F1 - score@10

StackOverflow 0.138 0.078 0.097

u

m

a

c

t

f

o

f

(

r

i

T

t

b

t

W

t

b

T

l

n

b

T

t

t

a

g

F

p

i

a

i

T

d

t

t

C

o

T

T

o

i

a

re to achieve good results is that the length of the description text inost software objects is short. For TagCNN and TagRCNN methods, we

ssume that the reason for the good results is that both of two methodsontain the convolutional layer and max-pooling layer. The convolu-ional operation can eliminate bias and the max-pooling operation canairly determine discriminative phrases in a text [25] .

In addition, to evaluate the K most popular tags method which is abvious method, we randomly extract 100k software objects with tagsrom stack-overflow data set. The example experimental results on the KK = 5 and 10) most popular tags is shown in Table 5 . The experimentalesults shows that the performance of the K most popular tags methods poor compared with other methods.

Our experimental results show that TagCNN and TagRCNN, which are based on deep learning models CNN and RCNN respectively, signifi- cantly outperform the three traditional approaches EnTagRec, TagMul- Rec, and FastTagRec on all ten software information sites. TagRNN

and TagHAN, which are based on deep learning models RNN and HAN

respectively, are not as good as these traditional approaches on some of the software information sites.

4.3.2. RQ2: Is the computational cost of the model training for the

proposed method acceptable?

Motivation. The four deep learning-based models (TagCNN,agRNN, TagHAN and TagRCNN) and the three models based on tradi-ional approaches (EnTagRec, TagMulRec, and FastTagRec) all need to

10

e trained before they can be used for tag recommendation. For traininghese deep learning-based models, we first train word vectors through

ord2Vec as the input of these models. The cost of training word vec-ors is included into the computational cost of these deep learning-ased models training. Training of EnTagRec, TagMulRec, FastTagRec,agCNN, TagRNN, TagHAN and TagRCNN, is done only once and off-

ine. After model training, using these models to recommend tags takesegligible time on-line. Understanding the training time cost helps us toetter understand the scalability of each approach.

Approach. For the four deep learning-based models TagCNN,agRNN, TagHAN and TagRCNN, we record the start time and the endime of the program for training word vectors and models to obtain theraining time. For the three traditional models EnTagRec, TagMulRecnd FastTagRec, we record the start time and the end time of the pro-ram for training models to obtain the training time.

Results. Table 6 gives the training cost of EnTagRec, TagMulRec,astTagRec, TagCNN, TagRNN, TagHAN and TagRCNN models. Com-ared with TagMulRec on ten sites, the average training cost of TagRNNs 2.1 times that of TagMulRec, the average training cost of TagHAN isbout 5.8 times that of TagMulRec, the average training cost of TagCNNs about 5.7 times that of TagMulRec and the average training cost ofagRCNN is about 19.3 times that of TagMulRec. Compared with foureep learning-based approaches on six small-scale sites, the averageraining cost of EnTagRec is about 169 times that of TagRNN, 64 timeshat of TagHAN, 53 times that of TagCNN and 14 times that of TagR-NN. Compared with FastTagRec on ten sits, the average training costf TagRNN is 9.2 times that of FastTagRec, the average training cost ofagHAN is 25.3 times that of FastTagRec, the average training cost ofagCNN is 27.8 times that of FastTagRec and the average training costf TagRNN is 101.4 times that of FastTagRec. Because the training costs a one-time expense, we conclude that the training costs of TagCNNnd TagRCNN are acceptable.

Training of tag recommendation models can be done off-line and only needs to be done once. TagCNN and TagRCNN are practical for use on the various-scale datasets

Page 11: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

Table 6

Four deep learning approaches vs. three traditional approaches in terms of the training model time.

Training Time(s)

Site Name TagMulRec EnTagRec FastTagRec TagCNN TagRNN TagHAN TagRCNN

StackOverflow 41,808 – 28,783 213,303 84,433 190,156 203,770

Askubuntu 908 – 372 5466 1682 4601 5111

Serverfault 246 – 87 1467 537 1392 5879

Unix 621 – 101 3585 1038 2812 8772

Codereview 495 51,959 83 2860 857 2329 6956

Freecode 95 55,963 48 581 452 1319 1494

Database Administrator 311 73,765 32 1776 566 1580 6737

Wordpress 361 91,229 34 2059 633 1771 9065

AskDifferent 211 99,204 90 1273 227 551 6417

Software Engineering 205 54,160 44 1194 441 1295 7751

5

w

d

5

s

p

p

m

l

i

p

l

(

m

o

O

fi

r

t

s

h

v

n

t

l

f

h

T

c

o

T

v

t

F

s

t

o

d

w

a

i

t

t

a

5

o

e

d

o

e

t

a

t

h

s

c

t

a

n

m

t

s

𝑠

a

o

s

s

c

e

i

t

t

i

o

d

t

6

T

m

s

. Discussion

In this section, we will first share some of the important lessons thate learned in implementing the work in this paper, followed by theiscussion on threats to Validity.

.1. Implementation of deep learning-based methods

Many typical software engineering problems can be regarded as clas-ification problems. Classification problems include binary classificationroblems, multiple classification problems and multi-label classificationroblems. For example, defect prediction [61] , code clone detection [3] ,issing issue-commit link recovery [62] are binary classification prob-

ems, where one object is associated to one of the two categories; predict-ng semantically linkable knowledge in developer online forums [5] , androgram classification problem [63] are multiple classification prob-ems, where one object is associated to one of the multiple categorieslarge than two); automated tag recommendation for software infor-ation sites [14–16,26] is a multi-label classification problem, where

ne object could be associated with several categories simultaneously.bviously, the research problem in this paper is a multi-label classi-cation problem. For classification problems, high quality vector rep-esentations of the observed object or artifact can significantly improvehe deep learning-based model accuracy [37,64] . However, for a typicaloftware engineering problem that is regarded as classification problem,ow to best represent the software object or artifact with a high qualityector is still a major challenge.

In this paper, for each software object, we utilized four different deepeural network structures (TagCNN, TagRNN, TagHAN and TagRCNN)o obtain different vector representations for use by each different deepearning-based model. TagRNN aims to put more weight on recent in-ormation where later words are more dominant than earlier ones, andence the vector representation of a software object obtained by theagRNN method is also a representation that puts more weight on re-ent information. TagHAN method is based on the hierarchical structuref text and the quality of the vector will not be ideal if the text is short.herefore, the major reason for our failure to achieve a high qualityector representation of a software object with the TagHAN method ishat the length of the description text in most software objects is short.or TagCNN and TagRCNN methods, the quality of the vector repre-entation of a software object is high as both of these methods containhe convolutional layer and the max-pooling layer. The convolutionalperation can eliminate bias and the max-pooling operation can fairlyetermine discriminative phrases in a text [25] .

Given our experiences in implementing these deep learning methods,e see that a high quality vector representation of a software object orrtifact can greatly improve performance of typical software engineer-ng problems which can be formulated as classification problems. Fur-her research is needed in software engineering to improve representa-

11

ion of such software artifacts and relationships for deep learning-basedpplications [65] .

.2. Threats to validity

There are several threats that can potentially affect the validity ofur experimental results.

Experimental Errors . Threats to internal validity relate to errors in ourxperimental data, approaches implementation and settings. We haveouble checked our experimental datasets, approaches implementationr re-implementation and settings, however there could be experimentalrrors that we did not detect.

Biased Results . Our tag recommendation task assumes that existingags in a software information site are correct. However, human errorsre inevitable. We do apply some filtering rules, such as removing rareags and software objects, to alleviate the problem. These filtering rulesave also been used in past research [14,15,26,27] . However, this issue,uch as how to deal with large number of synonymous tags, cannot beompletely solved.

Generalizability . In this paper, we evaluated our four approaches andhree traditional approaches on ten software information sites. Therere more than 11,000,000 software objects in the ten datasets, and theumber of software objects in the ten datasets ranges from about 40K toore than ten million. Even so, more case studies are needed to reduce

his external validity threat. In the future, more software informationites need to be used.

Evaluation Metrics . In this paper, Recall @ k, Precision @ k and 𝐹 1 −𝑐𝑜𝑟𝑒 @𝑘 are used as our evaluation metrics for the algorithms. Recall @ k

nd Precision @ k have been used in the past to evaluate the performancef tag recommendation for software information sites [14–16] and forocial media and network [66–69] . The computational costs of con-tructing tag recommendation models are also used in this paper. Be-ause of differences in operating systems, hardware, the developmentnvironment and others, the computational costs may be not suitabilityn repeated our experiments.

Scalability . We applied our algorithms to one large software informa-ion site, StackOverflow. Further scalability evaluation is needed to de-ermine if both the online tag recommendation speed and offline train-ng costs are acceptable for very large software information sites.

Model Components . We only utilized the text part of the descriptionf software objects. We removed code snippets and screenshots from theescription. We will look to utilize these code snippet and screenshot inhe description to extend our models in our future work.

. Conclusion

In this paper, we proposed four new deep learning-based methodsagCNN, TagRNN, TagHAN and TagRCNN for automated tag recom-endation for large software information sites. The purpose of our

tudy is to investigate whether there are deep learning methods that can

Page 12: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

a

r

r

d

w

a

t

d

p

s

t

w

w

c

m

w

i

f

p

A

o

r

6

A

S

S

R

[

[

[

[[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

chieve better performance than traditional approaches used for the tagecommendation task for software information sites. Our experimentalesults have shown that:

• For the tag recommendation task, two of our approaches TagCNNand TagRCNN, which are based on deep learning models CNN andRCNN respectively, achieved significant performance improvementover the three traditional approaches EnTagRec, TagMulRec, andFastTagRec. In contrast, the other two approaches TagRNN andTagHAN, which are based on deep learning models RNN and HANrespectively, were outperformed by these traditional approaches.

• Training of recommendation models can be done off-line and onlyneeds to be done once. Although TagCNN and TagRCNN requirelonger training time than TagMulRec, the overhead is acceptable forreal-world practice.

In summary, based on the results of this study, we believe the use ofeep learning can help to find a better approach for some typical Soft-are Engineering problems. Several open challenges remain. Softwarertifacts and relationships must be processed and represented in a vec-orized form suitable for the chosen deep learning algorithm. For largeata sets, the deep learning algorithm needs to be scaled across multi-le machines and training may be a very time and compute intensivetep. Even if this training is done off-line, if retraining is needed often,he scalability of the approach may be in question. As discussed above,e need to apply our new deep learning-based models to further soft-are information sites and extend it to process rich content including

ode, design models and images, to more precisely determine its perfor-ance relative to more traditional data analytical approaches. Finally,e need to develop other deep learning-based models for other data

ntensive software engineering tasks, such as link recommendation, ef-ort estimate, vulnerability detection and so on, to determine how theseerform in practice.

cknowledgments

This work is supported in part by the National Key R&D Programf China (No. 2018YFC1604000), the grands of the National Natu-al Science Foundation of China ( 61572374 , U163620068 , U1135005 ,1572371 , 61772525 ), the Open Fund of Key Laboratory of Networkssessment Technology from CAS, Guangxi Key Laboratory of Trustedoftware (No. kx201607 ), the Academic Team Building Plan for Youngcholars from Wuhan University ( WHU2016012 ).

eferences

[1] T.D. Nguyen , A.T. Nguyen , H.D. Phan , T.N. Nguyen , Exploring API embedding forAPI usages and applications, in: Proceedings of the 39th International Conferenceon Software Engineering, IEEE Press, 2017, pp. 438–449 .

[2] V.J. Hellendoorn , P. Devanbu , Are deep neural networks the best choice for modelingsource code? in: Proceedings of the 2017 11th Joint Meeting on Foundations ofSoftware Engineering, ACM, 2017, pp. 763–773 .

[3] M. White , M. Tufano , C. Vendome , D. Poshyvanyk , Deep learning code fragments forcode clone detection, in: Proceedings of the 31st IEEE/ACM International Conferenceon Automated Software Engineering, ACM, 2016, pp. 87–98 .

[4] J. Guo , J. Cheng , J. Cleland-Huang , Semantically enhanced software traceabilityusing deep learning techniques, in: Proceedings of the 39th International Conferenceon Software Engineering, IEEE Press, 2017, pp. 3–14 .

[5] B. Xu , D. Ye , Z. Xing , X. Xia , G. Chen , S. Li , Predicting semantically linkable knowl-edge in developer online forums via convolutional neural network, in: Proceedingsof the 31st IEEE/ACM International Conference on Automated Software Engineer-ing, ACM, 2016, pp. 51–62 .

[6] A. Krizhevsky , I. Sutskever , G.E. Hinton , Imagenet classification with deep convo-lutional neural networks, in: Advances in Neural Information Processing Systems,2012, pp. 1097–1105 .

[7] A.-r. Mohamed , G.E. Dahl , G. Hinton , Acoustic modeling using deep belief networks,IEEE Trans. Audio Speech Lang. Process. 20 (1) (2012) 14–22 .

[8] A. Graves , A.-r. Mohamed , G. Hinton , Speech recognition with deep recurrent neuralnetworks, in: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE Interna-tional Conference on, IEEE, 2013, pp. 6645–6649 .

[9] T.N. Sainath , O. Vinyals , A. Senior , H. Sak , Convolutional, long short-term memory,fully connected deep neural networks, in: Acoustics, Speech and Signal Processing(ICASSP), 2015 IEEE International Conference on, IEEE, 2015, pp. 4580–4584 .

12

10] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrentneural networks on sequence modeling, arXiv: 1412.3555 (2014).

11] A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deepconvolutional generative adversarial networks, arXiv: 1511.06434 (2015).

12] W. Fu , T. Menzies , Easy over hard: a case study on deep learning, in: Joint Meetingon Foundations of Software Engineering, ACM, 2017, pp. 49–60 .

13] F.P. Brooks , No silver bullet, IEEE Comput. 20 (4) (1987) 10–19 . 14] X. Xia , D. Lo , X. Wang , B. Zhou , Tag recommendation in software information sites,

in: Proceedings of the 10th Working Conference on Mining Software Repositories,IEEE Press, 2013, pp. 287–296 .

15] S. Wang , D. Lo , B. Vasilescu , A. Serebrenik , Entagrec: an enhanced tag recommen-dation system for software information sites., in: ICSME, 2014, pp. 291–300 .

16] J.M. Al-Kofahi , A. Tamrawi , T.T. Nguyen , H.A. Nguyen , T.N. Nguyen , Fuzzy set ap-proach for automatic tagging in evolving software, in: Software Maintenance (ICSM),2010 IEEE International Conference on, IEEE, 2010, pp. 1–10 .

17] M.-A. Storey , C. Treude , A. van Deursen , L.-T. Cheng , The impact of social media onsoftware engineering practices and tools, in: Proceedings of the FSE/SDP workshopon Future of Software Engineering Research, ACM, 2010, pp. 359–364 .

18] A. Begel , R. DeLine , T. Zimmermann , Social media for software engineering, in:Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research,ACM, 2010, pp. 33–38 .

19] J. Brandt , P.J. Guo , J. Lewenstein , M. Dontcheva , S.R. Klemmer , Two studies ofopportunistic programming: interleaving web foraging, learning, and writing code,in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,ACM, 2009, pp. 1589–1598 .

20] L. Guerrouj , S. Azad , P.C. Rigby , The influence of App churn on App success andstackoverflow discussions, in: 2015 IEEE 22nd International Conference on SoftwareAnalysis, Evolution, and Reengineering (SANER), IEEE, 2015, pp. 321–330 .

21] C. Treude , M.-A. Storey , Work item tagging: communicating concerns in collabora-tive software development, IEEE Trans. Softw. Eng. 38 (1) (2012) 19–34 .

22] N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural network formodelling sentences, arXiv: 1404.2188 (2014).

23] P. Liu, X. Qiu, X. Huang, Recurrent neural network for text classification with multi-task learning, arXiv: 1605.05101 (2016).

24] Z. Yang , D. Yang , C. Dyer , X. He , A.J. Smola , E.H. Hovy , Hierarchical attentionnetworks for document classification., in: HLT-NAACL, 2016, pp. 1480–1489 .

25] S. Lai , L. Xu , K. Liu , J. Zhao , Recurrent convolutional neural networks for text clas-sification., in: AAAI, 333, 2015, pp. 2267–2273 .

26] P. Zhou , J. Liu , Z. Yang , G. Zhou , Scalable tag recommendation for software in-formation sites, in: The 24th IEEE International Conference on Software Analysis,Evolution, and Reengineering (SANER), 2017 .

27] J. Liu , P. Zhou , Z. Yang , X. Liu , J. Grundy , Fasttagrec: fast tag recommendation forsoftware information sites, Autom. Softw. Eng. (2018) .

28] B. Sigurbjörnsson , R. Van Zwol , Flickr tag recommendation based on collectiveknowledge, in: Proceedings of the 17th International Conference on World WideWeb, ACM, 2008, pp. 327–336 .

29] S. Rendle , L. Schmidt-Thieme , Pairwise interaction tensor factorization for personal-ized tag recommendation, in: Proceedings of the third ACM International Conferenceon Web Search and Data Mining, ACM, 2010, pp. 81–90 .

30] D. Yin , Z. Xue , L. Hong , B.D. Davison , A probabilistic model for personalized tagprediction, in: Proceedings of the 16th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, ACM, 2010, pp. 959–968 .

31] Q. Wang , L. Ruan , Z. Zhang , L. Si , Learning compact hashing codes for efficient tagcompletion and prediction, in: Proceedings of the 22nd ACM International Confer-ence on Information & Knowledge Management, ACM, 2013, pp. 1789–1794 .

32] R. Jäschke , L. Marinho , A. Hotho , L. Schmidt-Thieme , G. Stumme , Tag recommen-dations in folksonomies, in: European Conference on Principles of Data Mining andKnowledge Discovery, Springer, 2007, pp. 506–514 .

33] E.F. Martins , F.M. Belm , J.M. Almeida , M.A. Gonalves , On cold start for associativetag recommendation, J. Assoc. Inf. Sci. Technol. 67 (1) (2016) 83–105 .

34] Y. LeCun , Y. Bengio , G. Hinton , Deep learning, Nature 521 (7553) (2015) 436–444 .

35] A. Karpathy , G. Toderici , S. Shetty , T. Leung , R. Sukthankar , L. Fei-Fei , Large-scalevideo classification with convolutional neural networks, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2014, pp. 1725–1732 .

36] T. Mikolov , M. Karafiát , L. Burget , J. Cernock ỳ, S. Khudanpur , Recurrent neuralnetwork based language model., in: Interspeech, 2, 2010, p. 3 .

37] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representa-tions in vector space, arXiv: 1301.3781 (2013).

38] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of tricks for efficient text classi-fication (2016) 427–431.

39] J. Pennington , R. Socher , C. Manning , Glove: global vectors for word representa-tion, in: Conference on Empirical Methods in Natural Language Processing, 2014,pp. 1532–1543 .

40] P. Covington , J. Adams , E. Sargin , Deep neural networks for youtube recommenda-tions, in: Proceedings of the 10th ACM Conference on Recommender Systems, ACM,2016, pp. 191–198 .

41] H.-T. Cheng , L. Koc , J. Harmsen , T. Shaked , T. Chandra , H. Aradhye , G. Anderson ,G. Corrado , W. Chai , M. Ispir , et al. , Wide &amp; deep learning for recommendersystems, in: Proceedings of the 1st Workshop on Deep Learning for RecommenderSystems, ACM, 2016, pp. 7–10 .

42] S. Okura , Y. Tagami , S. Ono , A. Tajima , Embedding-based news recommendation formillions of users, in: Proceedings of the 23rd ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, ACM, 2017, pp. 1933–1942 .

43] X. He, X. Du, X. Wang, F. Tian, J. Tang, T.-S. Chua, Outer product-based neuralcollaborative filtering, arXiv: 1808.03912 (2018).

Page 13: Information and Software Technologystatic.tongtianta.site/paper_pdf/9c1e84fe-725f-11e9-a671... · 2019-05-09 · InformationandSoftwareTechnology109(2019)1–13 Contents lists available

P. Zhou, J. Liu and X. Liu et al. Information and Software Technology 109 (2019) 1–13

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

44] C.-Y. Wu , A. Ahmed , A. Beutel , A.J. Smola , H. Jing , Recurrent recommender net-works, in: Proceedings of the Tenth ACM International Conference on Web Searchand Data Mining, ACM, 2017, pp. 495–503 .

45] H. Ying , F. Zhuang , F. Zhang , Y. Liu , G. Xu , X. Xie , H. Xiong , J. Wu , Sequential recom-mender system based on hierarchical attention networks, in: the 27th InternationalJoint Conference on Artificial Intelligence, 2018 .

46] R. Salakhutdinov , A. Mnih , G. Hinton , Restricted Boltzmann machines for collab-orative filtering, in: Proceedings of the 24th International Conference on MachineLearning, ACM, 2007, pp. 791–798 .

47] X. Cai , J. Han , L. Yang , Generative adversarial network based heterogeneous bibli-ographic network representation for personalized citation recommendation., AAAI,2018 .

48] I. Munemasa , Y. Tomomatsu , K. Hayashi , T. Takagi , Deep reinforcement learning forrecommender systems, in: International Conference on Information and Communi-cations Technology, 2018, pp. 226–233 .

49] M. Peng , G. Zeng , Z. Sun , J. Huang , H. Wang , G. Tian , Personalized Apprecommendation based on App permissions, World Wide Web 21 (1) (2018) 89–104 .

50] Z. Xu , J. Xuan , J. Liu , X. Cui , Michac: defect prediction via feature selection basedon maximal information coefficient with hierarchical agglomerative clustering, in:Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd Interna-tional Conference on, 1, IEEE, 2016, pp. 370–381 .

51] D. Ye , Z. Xing , C.Y. Foo , Z.Q. Ang , J. Li , N. Kapre , Software-specific named entityrecognition in software engineering social content, in: Software Analysis, Evolution,and Reengineering (SANER), 2016 IEEE 23rd International Conference on, 1, IEEE,2016, pp. 90–101 .

52] M. Mondal , C.K. Roy , K.A. Schneider , Spcp-miner: a tool for mining code clonesthat are important for refactoring or tracking, in: Software Analysis, Evolution andReengineering (SANER), 2015 IEEE 22nd International Conference on, IEEE, 2015,pp. 484–488 .

53] Y. Kim, Convolutional neural networks for sentence classification, arXiv: 1408.5882(2014).

54] Y. Zhang, B. Wallace, A sensitivity analysis of (and practitioners’ guide to) convolu-tional neural networks for sentence classification, arXiv: 1510.03820 (2015).

55] M.A. Nielsen , Neural networks and deep learning, 25, Determination press USA,2015 .

13

56] G.E. Dahl , T.N. Sainath , G.E. Hinton , Improving deep neural networks for lvcsrusing rectified linear units and dropout, in: Acoustics, Speech and Signal Pro-cessing (ICASSP), 2013 IEEE International Conference on, IEEE, 2013, pp. 8609–8613 .

57] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning toalign and translate, arXiv: 1409.0473 (2014).

58] D. Costarelli , R. Spigler , Approximation results for neural network operators acti-vated by sigmoidal functions, Neural Netw. 44 (8) (2013) 101–106 .

59] Q. Song , Z. Jia , M. Shepperd , S. Ying , J. Liu , A general software defect-pronenessprediction framework, IEEE Trans. Softw. Eng. 37 (3) (2011) 356–370 .

60] T. Zimmermann , N. Nagappan , Predicting defects using network analysis on depen-dency graphs, in: Software Engineering, 2008. ICSE’08. ACM/IEEE 30th Interna-tional Conference on, IEEE, 2008, pp. 531–540 .

61] X. Yang , D. Lo , X. Xia , J. Sun , Tlel: a two-layer ensemble learning approach forjust-in-time defect prediction, Inf. Softw. Technol. (2017) 206–220 .

62] Y. Sun , C. Chen , Q. Wang , B. Boehm , Improving missing issue-commit link recov-ery using positive and unlabeled data, in: IEEE/ACM International Conference onAutomated Software Engineering, 2017, pp. 147–152 .

63] L. Mou , G. Li , L. Zhang , T. Wang , Z. Jin , Convolutional Neural Networks Over TreeStructures for Programming Language Processing, 2014, pp. 1287–1293 .

64] A. Radford , L. Metz , S. Chintala , Unsupervised representation learning with deepconvolutional generative adversarial networks, Comput. Sci. (2015) .

65] H.K. Dam , T. Tran , J. Grundy , A. Ghose , Deepsoft: A vision for a deep model ofsoftware, in: Proceedings of the 2016 24th ACM SIGSOFT International Symposiumon Foundations of Software Engineering, ACM, 2016, pp. 944–947 .

66] E. Zangerle , W. Gassler , G. Specht , Using tag recommendations to homogenize folk-sonomies in microblogging environments, in: International Conference on Social In-formatics, Springer, 2011, pp. 113–126 .

67] H. Wang , B. Chen , W.-J. Li , Collaborative topic regression with social regularizationfor tag recommendation., IJCAI, 2013 .

68] D. Yang , Y. Xiao , H. Tong , J. Zhang , W. Wang , An integrated tag recommendationalgorithm towards Weibo user profiling, in: International Conference on DatabaseSystems for Advanced Applications, Springer, 2015, pp. 353–373 .

69] D. Yang , Y. Xiao , Y. Song , J. Zhang , K. Zhang , W. Wang , Tag propagation based rec-ommendation across diverse social media, in: Proceedings of the 23rd InternationalConference on World Wide Web, ACM, 2014, pp. 407–408 .


Recommended