Recommendation of Crowdsourcing Tasks Based on Word2vec...

Research ArticleRecommendation of Crowdsourcing Tasks Based onWord2vec Semantic Tags

Qingxian Pan 12 Hongbin Dong 1 Yingjie Wang2 Zhipeng Cai 3 and Lizong Zhang4

1College of Computer Science and Technology Harbin Engineering University Harbin 150001 China2School of Computer and Control Engineering Yantai University Yantai 264005 China3Department of Computer Science Georgia State University Atlanta GA 30303 USA4School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu 611731 China

Correspondence should be addressed to Hongbin Dong donghongbinhrbeueducn

Received 1 November 2018 Revised 18 February 2019 Accepted 3 March 2019 Published 24 March 2019

Guest Editor Michele Nogueira

Copyright copy 2019 Qingxian Pan et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Crowdsourcing is the perfect show of collective intelligence and the key of finishing perfectly the crowdsourcing task is to allocatethe appropriate task to the appropriate worker Now the most of crowdsourcing platforms select tasks through tasks search but it isshort of individual recommendation of tasks Tag-semantic task recommendation model based on deep learning is proposed in thepaper In this paper the similarity of word vectors is computed and the semantic tags similar matrix database is established basedon the Word2vec deep learning The task recommending model is established based on semantic tags to achieve the individualrecommendation of crowdsourcing tasks Through computing the similarity of tags the relevance between task and worker isobtained which improves the robustness of task recommendation Through conducting comparison experiments on Tianpengweb dataset the effectiveness and applicability of the proposed model are verified

1 Introduction

Deep learning was proposed by Geoffrey Hinton et al in2006This method simulates human brain neural network tomodel and realize multiple level abstraction [1 2] In 2006Jeff Howe of American Wired magazine reporter proposedcrowdsourcing concept [3] As a new kind of business modelcrowdsourcing has been widespread concern in various fieldsand becomes the new hot point of computer research fieldsTask requester crowdsourcing platform and worker makeup crowdsourcing system [4] The process of crowdsourc-ing includes designing task publishing task selecting tasksensing task submitting solution and integrating solutionAmong them task selection is the key phase in the processof crowdsourcing This is the key to complete crowdsourcingtask that the appropriate worker selects appropriate task inappropriate time [5]

The popular crowdsourcing platforms use task searchingto get the favourite task by keyword searching [6] Howeverwith the rapid development of crowdsourcing the problem of

information overload is more and more serious In additionit is more and more difficult to get the favourite crowdsourc-ing task for worker Recommender system is an effectivemedium to solve the problem which is used on many E-Commerce Platforms such as Alibaba Amazon and Netflix[7] But there are many problems which are not solved in rec-ommender systems such as similarity calculation the lowerrecommended accuracy data sparseness and cold boot Inbrief improving the accuracy and reliability of recommendersystems has been paid more attention by scholars

However individual recommendation research of thetask is lesser in crowdsourcing and task selection is reliedon hobbies and expertise Few crowdsourcing platforms canactively recommend task This paper researches the crowd-sourcing tasks recommendation model based on Word2vecsemantic tags in order to achieve individual recommendationof crowdsourcing tasks [8]

The main contributions of this paper include followingthree contents

HindawiWireless Communications and Mobile ComputingVolume 2019 Article ID 2121850 10 pageshttpsdoiorg10115520192121850

2 Wireless Communications and Mobile Computing

Platform-Server

Task Design

Task Publishing

Receive Answer

Finishing Answer

RequesterWorkers

Task Selection

Task Reception

Task Solution

Answer Submission

Figure 1 The workflow of crowdsourcing

(1) Compute the similarity of word vectors and build thesemantic tags similar matrix database based on theWord2vec deep learning

(2) Research the task recommending model based onsemantic tags to achieve the individual recommen-dation of crowdsourcing tasks This paper computessimilarity of tasks and workers based on the semantictag similar matrix

(3) Utilizing the Tianpeng Web dataset the experimentsare conductedThe experimental results show that themodel is feasible and effectiveThemodel can be usedin other fields according to the different semanticdatabases

This paper is organized as follows Section 2 reviews therelated works The Work2vec is discussed in Section 3 Inaddition the tasks recommendation model and realizationmethod based on semantic tags are researched in Section 4The comparison experiments as well as the analysis forthe experimental results are introduced in Section 5 Theconclusion is presented in Section 6

2 Related Works

In order to discuss the related works for recommendation ofcrowdsourcing we respectively introduce the related worksof crowdsourcing and recommendations

21 Crowdsourcing In 2006 Jeff Howe proposed crowd-sourcing concept firstly [3] a company or an institutionoutsources the tasks performed by an employee in the past toan unspecific public network in a free and voluntary mannerWith the development of crowdsourcing technology thedifferent crowdsourcing concepts appeared Chen et al [9]summarized 40 different crowdsourcing definitions Feng etal [10] gave the definition of crowdsourcing according to thebasic features of crowdsourcing According to the definitioncrowdsourcing is a distributed problem-solving mechanismopening to the Internet public and it completes the tasks

that are difficult to complete by a computer through inte-grating computers and the unknown public on the Internet[11]

Crowdsourcing is successfully applied in language trans-lation image recognition intelligent transportation softwaredevelopment entry interpretation tourism photography andother fields which has become the perfect embodimentof group wisdom [12 13] Crowdsourcing is made up ofthe task requester crowdsourcing platform and workersThe crowdsourcing workflow includes designing tasks bytask requester publishing tasks selecting tasks by workerssolving tasks submitting answer and arranging answer Theworkflow of crowdsourcing is shown by Figure 1 The publicparticipation is the basis of crowdsourcing And the key tohigh-quality complete crowdsourcing tasks is to recommendappropriate tasks to appropriate worker in appropriate time[14]

22 Recommender Systems With the arrival of big data erathe problem of information overload is more and moreserious and that finding the useful and best informationis more and more difficult Recommender Systems is aneffective medium to solve the above problems [15] Howeverthere are some inherent defects in recommendation systemssuch as low accuracy data sparseness cold boot the defectsof the centralized system similarity calculation and beingeasy to be attacked In addition many recommender sys-tems applied to business systems whose purpose is to sellmore goods and seek the maximum benefits rather thanto recommend the best commodities to users In brief thecredibility and accuracy of recommendation systems need tobe improved which has attracted the attention of scholarsYang et al [16] proposed a recommender system based ontransfer learning Chen et al [17] proposed a recommendersystem based on bind context Tang et al [18] researchedrecommender system based on crossing knowledge Liu[19] and Zhou et al [20] researched recommender systemsfor social recommendation Combining Markov and socialattributes of users Wang et al [21] proposed a probability-based recommendation model to recommend items forusers

Wireless Communications and Mobile Computing 3

Input Projection Output

w(t-2)

w(t-1)

w(t+1)

w(t+2)

w(t)

SUM

(a) CBOWmodel


w(t-2)

w(t-1)

w(t+1)

w(t+2)

w(t)

(b) Skip-gram model

Figure 2 CBOWmodel and Skip-gram model

Crowdsourcing task recommendation is mainly fromthe perspective of crowdsourcing platform Based on thetask discovery model crowdsourcing platform recommendsrelated tasks according to the preferences of workers [5] Themain crowdsourcing platforms basically adopt theway of tasksearch and rarely adopt the method of task recommendation[22] Some task recommendation methods were researchedbased on traditional recommendation methods includingcontent-based recommendation collaborative filtering andmixed recommendation algorithms Ambati et al [23] pro-posed the use of task and workers historical information fortask recommendation Yuen et al [24] proposed a worker-task recommendation model through combining the histori-cal information of workers and browsing history Deng et al[25] researched the problem of maximizing task selection forspatiotemporal tasks

3 Word2vec

In 2003 Bengio et al [26] proposed Neural Network Lan-guage Model-NNLM based on 3 levels NNLM is used tocompute the probability 119901(119908119905 = 119894 | 119888119900119899119905119890119909119905) of the nextword 119908119905 of a context and word vector is the byproductduring training Word2vec is a tool based on deep learningto compute the similarity of word vector which was proposedby Google company in 2013 [27] It converts the word intoword vector and computes similarity according to the cosinebetween word vectors When using the tool the texts aftersegmentation are input and the output-word vector can beused to do a lot of Natural Language Processing (NLP) relatedwork such as clustering looking for synonyms and part ofspeech analysis

Word2vec uses word vector presentation mode basedon Distributed representation Distributed representation isproposed by Hinton in 1986 [28] Its basic thought is to mapeach word into a 119896-dimension real vector by training (119896 isa hyperparameter in the model) and to judge the semanticsimilarity between them according to the distance between

words (such as cosine similarity Euclidean distance) It usesa lsquo3 layers neural networkrsquo input layer-hidden layer-outputlayer Its core technology is to use Huffman code according toword frequency which makes the activated content basicallyconsistent of all word frequency similar words in hiddenlayer The higher the frequency of the word the less thenumber of hidden layers they activate which effectivelyreduces the computational complexity

Compared with Latent Semantic Index-LSI and LatentDirichlet Allocation-LDA Word2vec uses the context ofwords and makes the semantic information richer There aretwo kinds of training model-CBOW (Continuous Bag-of-Words) and Skip-gram in Word2vec which are shown byFigure 2 Two models both include input layer projectionlayer and output layer CBOW model predicts the currentwords according to the known context and Skip-grammodelpredicts context according to the current words

In this paper the objective optimization function ofCBOW is expressed by

119901 (119908 | 119862119900119899119905119890119909119905 (119908)) = 119897119908prod119895=2

119901 (119889119908119895 | 119909119908 120579119908119895minus1) (1)

where 119909119908 means the word vector of the root node in theHoffman tree 119862119900119899119905119890119909119905(119908) represents the context of word 119908that is the collection of 119908 peripheral words 119897119908represents thenodes number of the path 119901119908 and 119889119908119895 isin 0 1 representsHuffman code of the word 119908 1205791199081 1205791199082 120579119908119895minus1 isin 119877119898 representsthe vectors corresponding to nonleaf nodes of the path 119901119908Therefore the logistic regression probability 119901(119889119908119895 | 119909119908 120579119908119895minus1)that 119908 passes a node 119895 in the Hoffman tree is shown by (2)The corresponding parameter 120590(119909119879119908120579119908119895minus1) is shown by (3)

119901 (119889119908119895 | 119909119908 120579119908119895minus1) = 120590 (119909119879119908120579119908119895minus1) 119889119908119895 = 01 minus 120590 (119909119879119908120579119908119895minus1) 119889119908119895 = 1 (2)

120590 (119909119879119908120579119908119895minus1) = 11 + 119890minus119909119879119908120579119908119895minus1 (3)


In order to clearly represent themeaning of logistic regressionprobability 119901(119889119908119895 |119909119908 120579119908119895minus1) we combine (2) and (3) to obtainthe value of 119901(119889119908119895 |119909119908 120579119908119895minus1) which is shown by

119901 (119889119908119895 | 119909119908 120579119908119895minus1) = [120590 (119909119879119908120579119908119895minus1)]1minus119889119908119895sdot [1 minus 120590 (119909119879119908120579119908119895minus1)]119889119908119895

(4)

For avoiding the value of 119901(119908 | 119862119900119899119905119890119909119905(119908)) too small log-arithm Likelihood function is used to represent the objectivefunction thus (1) can be converted into

119871 = sum119908isin119862

log119901 (119908 | 119862119900119899119905119890119909119905 (119908)) (5)

Through combining (4) and (5) the objective function 119871 isshown by119871

= sum119908isin119862

log119897119908prod119895=2

[120590 (119909119879119908120579119908119895minus1)]1minus119889119908119895 sdot [1minus 120590 (119909119879119908120579119908119895minus1)]119889119908119895

= sum119908isin119862

119897119908sum119895=2

(1 minus 119889119908119895 sdot log [120590 (119909119879119908120579119908119895minus1)] + 119889119908119895 sdot log [1 minus 120590 (119909119879119908120579119908119895minus1)])

(6)

Therefore (6) is the object function of CBOW in this paperWord2vec uses random gradient ascent method to optimizethe object function of CBOW

4 The Tasks Recommendation Model andRealization Method Based on Semantic Tags

41 Basic Model Frame and Mathematical ComputationModel The results and discussion may be presented sepa-rately or in one combined section and may optionally bedivided into headed subsections

The core of themodel is the research of tag similarmatrixThe model uses tag similar matrix to compute the similarityof workers and tasks produces worker-tag similar matrixand realizes tasks recommendation or workers recommenda-tion In model tag similar matrix is obtained by Word2veccomputing Worker-tag matrix is got according to historywork information of theworker registration information etcAnd task-tag matrix is got according to task description taskclassification etc

Define tag similar matrix 119871 isin 119877119898times119898 [ 11989711 sdotsdotsdot 1198971119898 d

1198971198981 sdotsdotsdot 119897119898119898

] 119871is a symmetric matrix that is 119897119894119895 = 119897119895119894 119897119894119895 represents thesimilarity of tag 119894 and tag 119895 119897119894119895 isin [0 1] and its value is gotthrough using Word2vec tool to compute Define worker-tag

matrix 119882 isin 119877119899times119898 [ 11990811 sdotsdotsdot 1199081119898 d

1199081198991 sdotsdotsdot 119908119899119898

] and among them 119908119894119895 =1 worker 119894 has tag 119895 0 worker 119894 has not tag 119895

We define the task-tag matrix 119879 isin 119877119901times119898[ 11990511 sdotsdotsdot 1199051119898 d

1199051199011 sdotsdotsdot 119905119901119898

] and among them 119905119894119895 = 1 task 119894 has tag 119895 0task 119894 has not tag 119895

Therefore the worker-task similar matrix119882119879 is obtainedby (7) where 119882 is the worker-tag matrix 119871 is the tagsimilar matrix and119879119879means the task-tag transposedmatrixThrough (7) the relationship between workers and tasks canbe obtained

119882119879 = 119882 times 119871 times 119879119879

= [[[[[

11990811 sdot sdot sdot 1199081119898 d

1199081198991 sdot sdot sdot 119908119899119898]]]]]times [[[[[

11989711 sdot sdot sdot 1198971119898 d

1198971198981 sdot sdot sdot 119897119898119898]]]]]

times [[[[[

11990511 sdot sdot sdot 1199051119898 d

1199051199011 sdot sdot sdot 119905119901119898]]]]]

119879

(7)

42 Basic Flow Themain steps of the process of the proposedrecommendation model are shown as follows (1) computethe word vectors based on Word2vec (2) computing thesimilarity of word vectors (3) generating the tag similarmatrix (4) obtaining the worker-tag matrix and task-tagmatrix (5) computing the worker-task similarity matrix(6) 1198712 standardization and normalization (7) tasks andworkers recommendation Tag similar matrix generationusesWord2vec toolWorker-task similarity computation usesmathematical methods introduced in the previous sectionThe section mainly introduces standardization and normal-ization method1198712 standardization method the 1198712 norm definition ofvector 119909(1199091 1199092 119909119899) is shown as follows 119899119900119903119898(119909) =radic11990921 + 11990922 + sdot sdot sdot + 1199092119899

In order to make 119909 normalized to the unit 1198712 norm themapping between 119909 and 1199091015840 is established so that the 1198712 normof 1199091015840 is 1 and the proof is shown as follows

1 = 119899119900119903119898 (1199091015840) = radic11990921 + 11990922 + sdot sdot sdot + 1199092119899119899119900119903119898 (119909)= radic119909101584021 + 119909101584022 + sdot sdot sdot + 11990910158402119899= radic( 1199091119899119900119903119898 (119909))

2 + ( 1199092119899119900119903119898 (119909))2 + sdot sdot sdot + ( 119909119899119899119900119903119898 (119909))

2

(8)

where the value of 1199091015840119894 is shown by

1199091015840119894 = 119909119894119899119900119903119898 (119909) (9)

In order to get the standardization and generality of data thestandardization data of 1198712 is normalized so that the data fallin the interval [0 1] the conversion formula is shown by (10)where min(119883)means the minimum in 119883 and max(119883) is themaximum in119883

1199091015840119894 = 119909119894 minusmin (119883)max (119883) minusmin (119883) (10)


Table 1 Word2vec parameter setting

Parameter Value Parameter Valuewindow 8 hs 1size 100 cbow yesthreads 20 alpha 0001binary 0 negative 25

Table 2 Tag similar matrix L of simulation dataset

L1 L2 L3 L4 L5 L6 L7 L1 1000 0407 0124 0119 0126 0434 0075 L2 0407 1000 0766 0917 0993 0642 0546 L3 0124 0766 1000 0930 0477 0526 0744 L4 0119 0917 0930 1000 0909 0531 0394 L5 0126 0993 0477 0909 1000 0636 0860 L6 0434 0642 0526 0531 0636 1000 0166 L7 0075 0546 0744 0394 0860 0166 1000

Table 3 Worker-tag matrix W

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 W1 0 0 0 1 0 0 0 0 0 0 0 W2 0 0 0 0 0 0 0 1 0 0 0 W3 0 1 0 0 1 0 1 0 0 1 0 W4 1 0 0 0 0 0 0 0 1 0 0 W5 0 0 0 0 0 1 0 0 0 0 0

5 Experiment and Simulation

In this section we conduct the comparison experiments onthe simulation dataset and real dataset respectively The realdataset is the dataset crawled from Tianpeng web site

In the experiment text8 is corpora training set andexperimental environment is Intel Core (TM) i5-337U CPU18GHz dual-core and 8GB memory

51 e Experiments Conducted on Simulation Dataset Inthis group of comparison experiments the training param-eters are shown in Table 1

In addition the tag similar matrix after training is shownin Table 2 In thematrix the elements indicate the similaritiesbetween tags

In this group of experiments there are 100 workers 50tasks 2000 tags in the experiment The worker-tag matrix isgenerated randomly which is shown in Table 3The elementsin Table 3 represent the similarities betweenworkers and tagsThe task-tag matrix is shown in Table 4 The elements inTable 4 indicate the similarities between tasks and tags Aftercomputing the worker-task matrix the standardization andnormalization of worker-task matrix are shown in Table 5The elements in Table 5 mean the similarities betweenworkers and tasks

Table 4 Task-tag matrix T

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 T1 0 0 0 0 0 0 0 1 0 0 0 T2 0 0 1 0 0 0 1 0 1 0 0 T3 0 0 0 0 0 0 0 0 0 0 0 T4 0 0 1 0 1 0 0 0 0 0 0 T5 0 0 0 0 0 0 1 0 0 0 0

Table 5 Worker-task similar matrix

T1 T2 T3 T4 T5 T6 T7 U1 0754 0410 0369 0438 0365 0420 0396 U2 0680 0694 0712 0706 0682 0747 0720 U3 0387 0378 0407 0403 0385 0678 0351 U4 0405 0304 0681 0731 0733 0835 0704 U5 0279 0278 0696 0284 0294 0265 0324

Recall precision and F-measure are commonly usedevaluation indexes [29]The computingmethods for the threeevaluation indexes are shownby (11) (12) and (13) Accordingto (11) (12) and (13) it can be seen that F-measure index isthe comprehensive measure index through considering bothrecall and precision

Recall

= the quantity of related information retrievedthe quantity of related information in system

(11)

Precision

= the quantity of related information retrievedthe quantity of all information retrieved

(12)

F-measure = 2 times Precision times PecallPrecision + Recall

(13)

The threshold values are 055 06 and 065 respectivelyand the recall precision and F-measure of the 50 tasks areobtained The comparison experimental results on recallprecision and F-measure indexes are shown by Figures 34 and 5 respectively In these experiments x-coordinateindicates the Task-tag matrix T and y-coordinates are recallrate precision rate and F-measure rate respectively Fromthe experimental results it can be seen that threshold=06 hasbetter performance than other two thresholds comprehen-sively

In addition we compare the proposed method with themethod of tasks researchThe experimental result is shown inFigure 6 where x-coordinate indicates the Task-tag matrix Tand y-coordinate means the number of workersThemethodused in this paper is better than the method used in tasksresearch which proves the effectiveness of the method ofthis paper In addition the potential workers can be found


Table 6 Tag similar matrix L of Tianpeng dataset

L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 L13 L1 1000 0004 -0041 0100 0018 0048 0020 -0040 -0049 -0029 -0009 0038 -0026 L2 0004 1000 0803 0261 0882 0225 0493 0610 0391 0315 0817 0666 0601 L3 -0041 0803 1000 0231 0761 0166 0393 0533 0351 0259 0722 0603 0609 L4 0100 0261 0231 1000 0268 0134 0234 0176 0191 0229 0248 0251 0173 L5 0018 0882 0761 0268 1000 0218 0475 0571 0352 0228 0753 0659 0583 L6 0048 0225 0166 0134 0218 1000 0133 0135 0101 0095 0198 0222 0181 L7 0020 0493 0393 0234 0475 0133 1000 0334 0190 0192 0504 0459 0296 L8 -0040 0610 0533 0176 0571 0135 0334 1000 0295 0258 0556 0480 0515 L9 -0049 0391 0351 0191 0352 0101 0190 0295 1000 0248 0386 0239 0277 L10 -0029 0315 0259 0229 0228 0095 0192 0258 0248 1000 0238 0288 0236 L11 -0009 0817 0722 0248 0753 0198 0504 0556 0386 0238 1000 0616 0535 L12 0038 0666 0603 0251 0659 0222 0459 0480 0239 0288 0616 1000 0455 L13 -0026 0601 0609 0173 0583 0181 0296 0515 0277 0236 0535 0455 1000

0

01

02

03

04

05

06

07

08

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 T30 T31 T32 T33 T34 T35 T36 T37 T38 T39 T40 T41 T42 T43 T44 T45 T46 T47 T48 T49 T50

threshold=055threshold=06threshold=065

Figure 3 Recall of different thresholds

by lowering the threshold which can be used to analyze thepotential users

52 e Experiments Conducted on Tianpeng Dataset Thedata collected from the Tianpeng web site were collected toform a corpus for training and the tag similarity matrix wasobtained as shown in the Table 6

We select 510 workers and 371 tasks from Tianpengdataset as experimental objects Utilizing the dataset we con-duct the comparison experiments to verify the effectivenessof the proposed model In the comparison experiments 06is taken as the threshold and 20 tasks are randomly selectedas recommended objects The experimental results werecompared with binary map matching and greedy algorithm

in terms of recall rate accuracy rate and F-value measureindexes

According to the recall measure index the comparisonexperimental result is shown by Figure 7 The x-coordinateindicates the Task-tag matrix T and y-coordinate presentsthe recall rate From the experimental result it can beseen that the proposed recommendation model has the bestperformance on recall rate through compared with greedyalgorithm and bipartite graph matching In addition theproposed recommendationmodel has better stabilitywith thechanging of T

Figure 8 shows the experimental result on precision rateSimilarly the x-coordinate indicates the Task-tag matrix Tand y-coordinate means the precision rate In experimental


0

01

02

03

04

05

06

07



Figure 4 Precision of different thresholds

0

01

02

03

04

05

06



Figure 5 F-measure of different thresholds

result the average precision rate of the proposed recommen-dation is better than other two algorithms From Figure 7it can be seen that the proposed recommendation has thebest performance on precision rate through compared withgreedy algorithm and bipartite graph matching

According to the experimental result on F-measureshown by Figure 9 we can see that the proposed recommen-dation also has the best performance on F-measure In addi-tion F-measure index is the comprehensive measure indexthrough considering both recall and precision Therefore wecan infer that the proposed recommendation has the best

performance through compared with greedy algorithm andbipartite graph matching algorithm

Through the comparison shows that the proposed meth-ods than the binarymapmatchingmethod greedy algorithmin the recall F-measure index significantly in terms ofaccuracy with high and low because to make the task wouldbe able to complete the task of recommended for workersas much as possible including the potential of workers sothe accuracy index can be put lower in the recommendedrequirements It can be seen that themethod proposed in thispaper has higher practical significance and application value


0

5

10

15

20

25

30

35

40

45

50


this papertasks research

Figure 6 Comparison of experimental results

02

03

04

05

06

07

08

09

1

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20

this papergreedy algorithmbipartite graph matching

Figure 7 Recall of different methods

6 Conclusion

Crowdsourcing is the prefect shown of group wisdom Itwas applied in many fields as a new business model Inrecent years it has become the new hot research in computerscience The success key of crowdsourcing is to recommendtask to appropriate worker The recommendation methodbased on tag similar matrix is proposed in this paper Themethod uses Word2vec technology to generate tag similarmatrix and then computes the similarity of worker and taskAccording to the comparison experiments it proves that

the method is effective and feasible The recommendationmethod can be extended to other fields with the differentcorpora

Because the success key of crowdsourcing is the partic-ipate rate of workers it has become a hot topic in crowd-sourcing research such as reputation mechanism prefer-ence evolution and privacy protection of workers It willbe the focus of future research to improve the accuracyof recommender systems by combining recommender sys-tems with reputation preference evolution and historicalinformation


02

03

04

05

06

07

08

09

1



Figure 8 Precision of different methods

02

03

04

05

06

07

08

09

1



Figure 9 F-measure of different methods

Data Availability

The [Tianpeng] dataset used to support the findings of thisstudy are available from the corresponding author uponrequest

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

Thiswork is supported by theNational Natural Science Foun-dation of China under Grants No 61472095 No 61502410and No 61572418 the China Postdoctoral Science Founda-tion under Grant No 2017M622691 the National ScienceFoundation (NSF) under Grants No 1704287 No 1252292

and No 1741277 and the Natural Science Foundation ofSichuan Province under Grant No 2018HH0075

References

[1] Y Cun Y Bengio and G Hinton ldquoDeep learningrdquo Nature vol521 no 7553 pp 436ndash444 2015

[2] Y Wang Z Cai G Yin Y Gao X Tong and G WuldquoAn incentive mechanism with privacy protection in mobilecrowdsourcing systemsrdquo Computer Networks vol 102 pp 157ndash171 2016

[3] J Howe ldquoThe rise of crowdsourcingrdquo Wired Magazine vol 14no 6 pp 1ndash4 2006

[4] Z Cai andX Zheng ldquoAprivate and efficientmechanism for datauploading in smart cyber-physical systemsrdquo IEEE Transactionson Network Science and Engineering p 1 2018

[5] Y Hu Y Wang Y Li and X Tong ldquoAn incentive mechanismbased onmulti-attribute reverse auction inmobile crowdsourc-ingrdquo Sensors vol 18 no 10 p 3453 2018


[6] J Li Z Cai J Wang M Han and Y Li ldquoTruthful incen-tive mechanisms for geographical position conflicting mobilecrowdsensing systemsrdquo IEEE Transactions on ComputationalSocial Systems vol 5 no 2 pp 324ndash334 2018

[7] R Katarya and O P Verma ldquoRecent developments in affectiverecommender systemsrdquo Physica A Statistical Mechanics and itsApplications vol 461 pp 182ndash190 2016

[8] KW Church ldquoEmerging trendsWord2VecrdquoNatural LanguageEngineering vol 23 no 1 pp 155ndash162 2017

[9] X Chen P N Bennett K Collins-Thompson and E HorvitzldquoPairwise ranking aggregation in a crowdsourced settingrdquo inProceedings of the Sixth ACM International Conference pp 193ndash202 Rome Italy Feburary 2013

[10] J Feng G Li and J Feng ldquoA survey on crowdsourcingrdquoChineseJournal of Computers vol 38 pp 1713ndash1726 2015

[11] Z Duan W Li and Z Cai ldquoDistributed auctions for taskassignment and scheduling in mobile crowdsensing systemsrdquoin Proceedings of the 2017 IEEE 37th International Conferenceon Distributed Computing Systems (ICDCS) pp 635ndash644 GAUSA June 2017

[12] YWang Z Cai X Tong Y Gao andG Yin ldquoTruthful incentivemechanismwith location privacy-preserving formobile crowd-sourcing systemsrdquoComputer Networks vol 135 pp 32ndash43 2018

[13] YWang Y Li Z Chi and X Tong ldquoThe truthful evolution andincentive for large-scale mobile crowd sensing networksrdquo IEEEAccess vol 6 pp 51187ndash51199 2018

[14] J L Cai M Yan and Y Li ldquoUsing crowdsourced data inlocation-based social networks to explore influence maximiza-tionrdquo in Proceedings of the 35th Annual IEEE InternationalConference on Computer Communications 2016

[15] P Resnick and H R Varian ldquoRecommender systemsrdquo Commu-nications of the ACM vol 40 no 3 pp 56ndash58 1997

[16] W Pan and Q Yang ldquoTransfer learning in heterogeneouscollaborative filtering domainsrdquo Artificial Intelligence vol 197pp 39ndash55 2013

[17] G Chen and L Chen ldquoRecommendation based on contextualopinionsrdquo UMAP 2014 LNCS 8538 pp 61ndash73 2014

[18] L Liu J Tang J Han and S Yang ldquoLearning influence fromheterogeneous social networksrdquo Data Mining and KnowledgeDiscovery vol 25 no 3 pp 511ndash544 2012

[19] J Tang X Hu and H Liu ldquoSocial recommendation a reviewrdquoSocial Network Analysis and Mining vol 3 no 4 pp 1113ndash11332013

[20] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012

[21] Y Wang G Yin Z Cai Y Dong and H Dong ldquoA trust-basedprobabilistic recommendationmodel for social networksrdquo Jour-nal of Network and Computer Applications vol 55 pp 59ndash672015

[22] L Zhang Z Cai and X Wang ldquoFakeMask a novel privacypreserving approach for smartphonesrdquo IEEE Transactions onNetwork and Service Management vol 13 no 2 pp 335ndash3482016

[23] V Ambati S Vogel and J Carbonell ldquoTowards task recommen-dation in micro-task marketsrdquo in Proceedings of the 25th AAAIWorkshop in Human Computation pp 80ndash83 CA USA 2011

[24] M C Yuen I King and K S Leung ldquoProbabilistic matrixfactorization in task recommendation in crowdsourcing sys-temsrdquo in Proceedings of the 19th International Conference onNeural Information Processing pp 516ndash525 Springer DohaQatar 2012

[25] D Deng C Shahabi and U Demiryurek ldquoMaximizing thenumber of workerrsquos self-selected tasks in spatial crowdsourc-ingrdquo in Proceedings of the 21st ACM SIGSPATIAL InternationalConference pp 1ndash10 FL USA November 2013

[26] J Turian L Ratinov and Y Bengio ldquoWord representations asimple and general method for semi-supervised learningrdquo inProceedings of the 8th Annual Meeting of the Association forComputational Linguistics pp 384ndash394 Uppsala Sweden July2010

[27] Y Yao X Li X Liu et al ldquoSensing spatial distribution ofurban land use by integrating points-of-interest and GoogleWord2Vec modelrdquo International Journal of Geographical Infor-mation Science vol 31 no 4 pp 825ndash848 2017

[28] R Wang H Zhao B-L Lu M Utiyama and E Sumita ldquoBilin-gual continuous-space language model growing for statisticalmachine translationrdquo IEEE Transactions on Audio Speech andLanguage Processing vol 23 no 7 pp 1209ndash1220 2015

[29] L Li G Liu and Q Liu ldquoAdvancing iterative quantizationhashing using isotropic priorrdquo inProceedings of the InternationalConference on Multimedia Modelling pp 174ndash184 SpringerInternational Publishing 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018


Active and Passive Electronic Components

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of



Journal ofEngineeringVolume 2018

SensorsJournal of



RotatingMachinery


Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation


Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom


Platform-Server

Task Design

Task Publishing

Receive Answer

Finishing Answer

RequesterWorkers

Task Selection

Task Reception

Task Solution

Answer Submission

Figure 1 The workflow of crowdsourcing

(1) Compute the similarity of word vectors and build thesemantic tags similar matrix database based on theWord2vec deep learning

(2) Research the task recommending model based onsemantic tags to achieve the individual recommen-dation of crowdsourcing tasks This paper computessimilarity of tasks and workers based on the semantictag similar matrix

(3) Utilizing the Tianpeng Web dataset the experimentsare conductedThe experimental results show that themodel is feasible and effectiveThemodel can be usedin other fields according to the different semanticdatabases

This paper is organized as follows Section 2 reviews therelated works The Work2vec is discussed in Section 3 Inaddition the tasks recommendation model and realizationmethod based on semantic tags are researched in Section 4The comparison experiments as well as the analysis forthe experimental results are introduced in Section 5 Theconclusion is presented in Section 6

2 Related Works

In order to discuss the related works for recommendation ofcrowdsourcing we respectively introduce the related worksof crowdsourcing and recommendations

21 Crowdsourcing In 2006 Jeff Howe proposed crowd-sourcing concept firstly [3] a company or an institutionoutsources the tasks performed by an employee in the past toan unspecific public network in a free and voluntary mannerWith the development of crowdsourcing technology thedifferent crowdsourcing concepts appeared Chen et al [9]summarized 40 different crowdsourcing definitions Feng etal [10] gave the definition of crowdsourcing according to thebasic features of crowdsourcing According to the definitioncrowdsourcing is a distributed problem-solving mechanismopening to the Internet public and it completes the tasks

that are difficult to complete by a computer through inte-grating computers and the unknown public on the Internet[11]

Crowdsourcing is successfully applied in language trans-lation image recognition intelligent transportation softwaredevelopment entry interpretation tourism photography andother fields which has become the perfect embodimentof group wisdom [12 13] Crowdsourcing is made up ofthe task requester crowdsourcing platform and workersThe crowdsourcing workflow includes designing tasks bytask requester publishing tasks selecting tasks by workerssolving tasks submitting answer and arranging answer Theworkflow of crowdsourcing is shown by Figure 1 The publicparticipation is the basis of crowdsourcing And the key tohigh-quality complete crowdsourcing tasks is to recommendappropriate tasks to appropriate worker in appropriate time[14]

22 Recommender Systems With the arrival of big data erathe problem of information overload is more and moreserious and that finding the useful and best informationis more and more difficult Recommender Systems is aneffective medium to solve the above problems [15] Howeverthere are some inherent defects in recommendation systemssuch as low accuracy data sparseness cold boot the defectsof the centralized system similarity calculation and beingeasy to be attacked In addition many recommender sys-tems applied to business systems whose purpose is to sellmore goods and seek the maximum benefits rather thanto recommend the best commodities to users In brief thecredibility and accuracy of recommendation systems need tobe improved which has attracted the attention of scholarsYang et al [16] proposed a recommender system based ontransfer learning Chen et al [17] proposed a recommendersystem based on bind context Tang et al [18] researchedrecommender system based on crossing knowledge Liu[19] and Zhou et al [20] researched recommender systemsfor social recommendation Combining Markov and socialattributes of users Wang et al [21] proposed a probability-based recommendation model to recommend items forusers



w(t-2)

w(t-1)

w(t+1)

w(t+2)

w(t)

SUM

(a) CBOWmodel


w(t-2)

w(t-1)

w(t+1)

w(t+2)

w(t)

(b) Skip-gram model



3 Word2vec






119901 (119908 | 119862119900119899119905119890119909119905 (119908)) = 119897119908prod119895=2

119901 (119889119908119895 | 119909119908 120579119908119895minus1) (1)







(4)


119871 = sum119908isin119862

log119901 (119908 | 119862119900119899119905119890119909119905 (119908)) (5)


= sum119908isin119862

log119897119908prod119895=2


= sum119908isin119862

119897119908sum119895=2


(6)






1198971198981 sdotsdotsdot 119897119898119898



1199081198991 sdotsdotsdot 119908119899119898



1199051199011 sdotsdotsdot 119905119901119898



119882119879 = 119882 times 119871 times 119879119879

= [[[[[

11990811 sdot sdot sdot 1199081119898 d


11989711 sdot sdot sdot 1198971119898 d

1198971198981 sdot sdot sdot 119897119898119898]]]]]

times [[[[[

11990511 sdot sdot sdot 1199051119898 d

1199051199011 sdot sdot sdot 119905119901119898]]]]]

119879

(7)




2 + ( 1199092119899119900119903119898 (119909))2 + sdot sdot sdot + ( 119909119899119899119900119903119898 (119909))

2

(8)


1199091015840119894 = 119909119894119899119900119903119898 (119909) (9)







L1 L2 L3 L4 L5 L6 L7 L1 1000 0407 0124 0119 0126 0434 0075 L2 0407 1000 0766 0917 0993 0642 0546 L3 0124 0766 1000 0930 0477 0526 0744 L4 0119 0917 0930 1000 0909 0531 0394 L5 0126 0993 0477 0909 1000 0636 0860 L6 0434 0642 0526 0531 0636 1000 0166 L7 0075 0546 0744 0394 0860 0166 1000












T1 T2 T3 T4 T5 T6 T7 U1 0754 0410 0369 0438 0365 0420 0396 U2 0680 0694 0712 0706 0682 0747 0720 U3 0387 0378 0407 0403 0385 0678 0351 U4 0405 0304 0681 0731 0733 0835 0704 U5 0279 0278 0696 0284 0294 0265 0324


Recall


(11)

Precision


(12)


(13)






0

01

02

03

04

05

06

07

08











0

01

02

03

04

05

06

07




0

01

02

03

04

05

06









0

5

10

15

20

25

30

35

40

45

50




02

03

04

05

06

07

08

09

1




6 Conclusion





02

03

04

05

06

07

08

09

1




02

03

04

05

06

07

08

09

1




Data Availability




Acknowledgments



References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia




w(t-2)

w(t-1)

w(t+1)

w(t+2)

w(t)

SUM

(a) CBOWmodel


w(t-2)

w(t-1)

w(t+1)

w(t+2)

w(t)

(b) Skip-gram model



3 Word2vec






119901 (119908 | 119862119900119899119905119890119909119905 (119908)) = 119897119908prod119895=2

119901 (119889119908119895 | 119909119908 120579119908119895minus1) (1)







(4)


119871 = sum119908isin119862

log119901 (119908 | 119862119900119899119905119890119909119905 (119908)) (5)


= sum119908isin119862

log119897119908prod119895=2


= sum119908isin119862

119897119908sum119895=2


(6)






1198971198981 sdotsdotsdot 119897119898119898



1199081198991 sdotsdotsdot 119908119899119898



1199051199011 sdotsdotsdot 119905119901119898



119882119879 = 119882 times 119871 times 119879119879

= [[[[[

11990811 sdot sdot sdot 1199081119898 d


11989711 sdot sdot sdot 1198971119898 d

1198971198981 sdot sdot sdot 119897119898119898]]]]]

times [[[[[

11990511 sdot sdot sdot 1199051119898 d

1199051199011 sdot sdot sdot 119905119901119898]]]]]

119879

(7)




2 + ( 1199092119899119900119903119898 (119909))2 + sdot sdot sdot + ( 119909119899119899119900119903119898 (119909))

2

(8)


1199091015840119894 = 119909119894119899119900119903119898 (119909) (9)







L1 L2 L3 L4 L5 L6 L7 L1 1000 0407 0124 0119 0126 0434 0075 L2 0407 1000 0766 0917 0993 0642 0546 L3 0124 0766 1000 0930 0477 0526 0744 L4 0119 0917 0930 1000 0909 0531 0394 L5 0126 0993 0477 0909 1000 0636 0860 L6 0434 0642 0526 0531 0636 1000 0166 L7 0075 0546 0744 0394 0860 0166 1000












T1 T2 T3 T4 T5 T6 T7 U1 0754 0410 0369 0438 0365 0420 0396 U2 0680 0694 0712 0706 0682 0747 0720 U3 0387 0378 0407 0403 0385 0678 0351 U4 0405 0304 0681 0731 0733 0835 0704 U5 0279 0278 0696 0284 0294 0265 0324


Recall


(11)

Precision


(12)


(13)






0

01

02

03

04

05

06

07

08











0

01

02

03

04

05

06

07




0

01

02

03

04

05

06









0

5

10

15

20

25

30

35

40

45

50




02

03

04

05

06

07

08

09

1




6 Conclusion





02

03

04

05

06

07

08

09

1




02

03

04

05

06

07

08

09

1




Data Availability




Acknowledgments



References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia





(4)


119871 = sum119908isin119862

log119901 (119908 | 119862119900119899119905119890119909119905 (119908)) (5)


= sum119908isin119862

log119897119908prod119895=2


= sum119908isin119862

119897119908sum119895=2


(6)






1198971198981 sdotsdotsdot 119897119898119898



1199081198991 sdotsdotsdot 119908119899119898



1199051199011 sdotsdotsdot 119905119901119898



119882119879 = 119882 times 119871 times 119879119879

= [[[[[

11990811 sdot sdot sdot 1199081119898 d


11989711 sdot sdot sdot 1198971119898 d

1198971198981 sdot sdot sdot 119897119898119898]]]]]

times [[[[[

11990511 sdot sdot sdot 1199051119898 d

1199051199011 sdot sdot sdot 119905119901119898]]]]]

119879

(7)




2 + ( 1199092119899119900119903119898 (119909))2 + sdot sdot sdot + ( 119909119899119899119900119903119898 (119909))

2

(8)


1199091015840119894 = 119909119894119899119900119903119898 (119909) (9)







L1 L2 L3 L4 L5 L6 L7 L1 1000 0407 0124 0119 0126 0434 0075 L2 0407 1000 0766 0917 0993 0642 0546 L3 0124 0766 1000 0930 0477 0526 0744 L4 0119 0917 0930 1000 0909 0531 0394 L5 0126 0993 0477 0909 1000 0636 0860 L6 0434 0642 0526 0531 0636 1000 0166 L7 0075 0546 0744 0394 0860 0166 1000












T1 T2 T3 T4 T5 T6 T7 U1 0754 0410 0369 0438 0365 0420 0396 U2 0680 0694 0712 0706 0682 0747 0720 U3 0387 0378 0407 0403 0385 0678 0351 U4 0405 0304 0681 0731 0733 0835 0704 U5 0279 0278 0696 0284 0294 0265 0324


Recall


(11)

Precision


(12)


(13)






0

01

02

03

04

05

06

07

08











0

01

02

03

04

05

06

07




0

01

02

03

04

05

06









0

5

10

15

20

25

30

35

40

45

50




02

03

04

05

06

07

08

09

1




6 Conclusion





02

03

04

05

06

07

08

09

1




02

03

04

05

06

07

08

09

1




Data Availability




Acknowledgments



References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia






L1 L2 L3 L4 L5 L6 L7 L1 1000 0407 0124 0119 0126 0434 0075 L2 0407 1000 0766 0917 0993 0642 0546 L3 0124 0766 1000 0930 0477 0526 0744 L4 0119 0917 0930 1000 0909 0531 0394 L5 0126 0993 0477 0909 1000 0636 0860 L6 0434 0642 0526 0531 0636 1000 0166 L7 0075 0546 0744 0394 0860 0166 1000












T1 T2 T3 T4 T5 T6 T7 U1 0754 0410 0369 0438 0365 0420 0396 U2 0680 0694 0712 0706 0682 0747 0720 U3 0387 0378 0407 0403 0385 0678 0351 U4 0405 0304 0681 0731 0733 0835 0704 U5 0279 0278 0696 0284 0294 0265 0324


Recall


(11)

Precision


(12)


(13)






0

01

02

03

04

05

06

07

08











0

01

02

03

04

05

06

07




0

01

02

03

04

05

06









0

5

10

15

20

25

30

35

40

45

50




02

03

04

05

06

07

08

09

1




6 Conclusion





02

03

04

05

06

07

08

09

1




02

03

04

05

06

07

08

09

1




Data Availability




Acknowledgments



References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia





0

01

02

03

04

05

06

07

08











0

01

02

03

04

05

06

07




0

01

02

03

04

05

06









0

5

10

15

20

25

30

35

40

45

50




02

03

04

05

06

07

08

09

1




6 Conclusion





02

03

04

05

06

07

08

09

1




02

03

04

05

06

07

08

09

1




Data Availability




Acknowledgments



References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



0

01

02

03

04

05

06

07




0

01

02

03

04

05

06









0

5

10

15

20

25

30

35

40

45

50




02

03

04

05

06

07

08

09

1




6 Conclusion





02

03

04

05

06

07

08

09

1




02

03

04

05

06

07

08

09

1




Data Availability




Acknowledgments



References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



0

5

10

15

20

25

30

35

40

45

50




02

03

04

05

06

07

08

09

1




6 Conclusion





02

03

04

05

06

07

08

09

1




02

03

04

05

06

07

08

09

1




Data Availability




Acknowledgments



References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



02

03

04

05

06

07

08

09

1




02

03

04

05

06

07

08

09

1




Data Availability




Acknowledgments



References

































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia





























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia




RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia


Date post:	29-Jun-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Recommendation of Crowdsourcing Tasks Based on Word2vec...

Documents