Hiring Now: A Skill-Aware Multi-Attention Model for Job ...

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3096–3104July 5 - 10, 2020. c©2020 Association for Computational Linguistics

3096

Hiring Now: A Skill-Aware Multi-Attention Model forJob Posting Generation

Liting Liu1, Jie Liu2∗, Wenzheng Zhang2, Ziming Chi2, Wenxuan Shi1, Yalou Huang1

1College of Software, Nankai University, Tianjin, China2College of Artificial Intelligence, Nankai University, Tianjin, China{liu liting, wzzhang}@mail.nankai.edu.cn

[email protected]{jliu, shiwx, huangyl}@nankai.edu.cn

Abstract

Writing a good job posting is a critical stepin the recruiting process, but the task is oftenmore difficult than many people think. It ischallenging to specify the level of education,experience, relevant skills per the company in-formation and job description. To this end,we propose a novel task of Job Posting Gen-eration (JPG) that is cast as a conditional textgeneration problem to generate job require-ments according to the job descriptions. Todeal with this task, we devise a data-drivenglobal Skill-Aware Multi-Attention generationmodel, named SAMA. Specifically, to modelthe complex mapping relationships between in-put and output, we design a hierarchical de-coder that we first label the job descriptionwith multiple skills, then we generate a com-plete text guided by the skill labels. At thesame time, to exploit the prior knowledgeabout the skills, we further construct a skillknowledge graph to capture the global priorknowledge of skills and refine the generatedresults. The proposed approach is evaluatedon real-world job posting data. Experimentalresults clearly demonstrate the effectiveness ofthe proposed method1.

1 Introduction

Writing high-quality job postings is the crucial firststep to attract and filter the right talents in the re-cruiting process of human resource management.Given job descriptions and basic company informa-tion, the key to the job posting is to write job re-quirements, which requires to specify professionalskills properly. Both too many or few requirementsmay lead to negative impacts on talent recruiting.Because of the extremely large number of job po-sitions and varieties of professional skills, a lot of

∗∗Corresponding Author1https://github.com/NKU-IIPLab/SAMA

Basic Information

Job Description

Position: Market Researcher company scale: 1000 ~ 10000

1. Assist the General Manager in sourcing travel industrynews and in conducting product research and analysis.2. Facilitate effective communication between the marketresearch and user experience teams.3. Translate key industry texts and compose newslettersfor internal communication.

Job Requirement1. 3+ years of research experience at investment banks.2. Strong research, data analysis and communication skills.3. Proficient user of Microsoft Suite/G Suite.

Figure 1: An example of automatic job posting.

companies have to pay much cost in this step towin in the war of talents.

To this end, we propose the task of Job PostingGeneration (JPG) in this paper, and we cast it asa novel conditional text generation task that gen-erates the job requirement paragraph. Exploitingthe ubiquitous job posting data, we aim to auto-matically specify the level of necessary skills andgenerate fluent job requirements in a data-drivenmanner, as shown in Figure 1.

Although the JPG task is of great significance,the complexity of it poses several key challenges:1) Generating job requirements needs to not onlyproduce overall fluent text but also precisely orga-nize the key content like skills and other informa-tion, which is very difficult to current neural sys-tems. Especially, the long-text to long-text genera-tion easily leads to information missing (Shen et al.,2019). 2) The key points of job descriptions andthe skills of job requirements are complex many-to-many relations, which makes the mapping learningvery difficult. 3) How to exploit the global infor-mation among the heterogeneous relations betweenbasic company information and the professional

https://github.com/NKU-IIPLab/SAMA

3097

skills across the whole dataset is of great impor-tance to generate high-quality job requirements.

To address these challenges, we focus on therichness and accuracy of skills in generated job re-quirements and propose a global Skill-Aware Multi-Attention (SAMA) model for JPG task. Specifi-cally, we devise a two-pass decoder to generateinformative, accurate, and fluent job requirementparagraph. The first-pass decoder is to predict mul-tiple skills according to the job description, whichis a multi-label classification task (Zhang and Zhou,2014). The second-pass decoder is to generate acomplete text according to the predicted skill la-bels and the input text. Moreover, we build a skillknowledge graph to capture the global informationin the whole job posting dataset in addition to thelocal information provided by the input. Throughthe skill knowledge graph, our model obtains theglobal prior knowledge to alleviate the misusingof skills. Extensive experiments are conducted toevaluate our model on real-world job posting data.The result demonstrates the effectiveness of theproposed method.

The main contributions of this paper can be sum-marized as follows:

• We propose a novel task of job posting gener-ation that is defined as the conditional genera-tion given a job description and basic companyinformation to generate a job requirement.

• A data-driven generation approach SAMA isproposed to model the complex mapping rela-tionships and generate informative and accu-rate job requirements.

• We build a real-world job posting dataset andconducte extensive experiments to validate theeffectiveness and superiority of our proposedapproach.

2 Data Description

We collect a job posting dataset from a famous Chi-nese online recruiting market, across a period of19 months, ranging from 2019 to 2020. There are107,616 job postings in total. After removing repet-itive and too short job postings, 11,221 records areselected. This dataset is collected from 6 differ-ent industry domains. The detailed statistics of thedataset are illustrated in Table 1.

Considering the importance of the skills for JPG,we select 2000 records and manually tag the skillsin these records. Then we train a word-level LSTM-

training validation testingInternet 2055 509 687

Consumer goods 1153 292 356Real Estate 969 220 276

Finance 1477 366 463Automobile 997 282 296

Medical 397 94 115

Table 1: The statistics of the dataset.

CRF model (Huang et al., 2015) to recognize theskills in the whole dataset.

We also keep the basic information, i.e., job posi-tion and company scale information, for the reasonthat they are the critical attributes of job postingsthat have impacts on the level of skills.

In order to capture the global prior knowledgeof skills, we construct a skill knowledge graphaccording to the semantic relations of entities inthe job postings. As shown in Figure 2, there arethree types of entities, i.e., skill, company scale,and job position. The entities of skills are dividedinto two types, generic skills (denoted by G) andprofessional skills (denoted by P), according to thenumber of occurrences. The relation N.T.M. (need-to-master) exists between job position entity andskill entity. Besides, the relation IN exists betweencompany scale entity and skill entity. For example,jobseeker who is seeking for a programmer positionin a company of 10 to 100 people needs to masterthe professional skill C++, then there exist threetriplets, (programmer, N.T.M., C++), ([10, 100],IN, C++) and (C++, type, P).

resilience

manager

N.T.M.

Excel

N.T.M.

[10, 100]

IN

interpreter

N.T.M.

[100, 1000]

IN IN

C++

programmer

IN

N.T.M.

N.T.M.

G

P

P

entity(skill)

entity(scale)

entity(position)

N.T.M. relation 1IN relation 2

type

Figure 2: An example of the skill knowledge graph.

3 Approach

Let D = {(Bi, Xi, Yi)}Ni=1 denote the dataset,where Xi = (xi,1, xi,2, ..., xi,m) is the wordsequence of job description paragraph. Yi =(yi,1, yi,2, ..., yi,n) is the word sequence of job re-quirement paragraph, Bi = (bpi , b

si ) is the basic

information, bp and bs are job position and com-pany scale information, N is the size of dataset,m and n are the lengths of sequence Xi and Yi,respectively. The target of the JPG task is to esti-mate P (Yi|Xi, Bi), the conditional probability of a

3098

𝑦𝑦𝑖𝑖,𝑛𝑛

Text Encoder

Skill Prediction

Skill Words Set 𝑂𝑂𝑖𝑖

𝐵𝐵𝑖𝑖 = ��𝑏𝑏𝑖𝑖𝑝𝑝, 𝑏𝑏𝑖𝑖𝑠𝑠

Text Decoder

Target Sequence

Attention 3 Attention 2Attention 4

𝑜𝑜𝑖𝑖,1 𝑜𝑜𝑖𝑖,2 𝑜𝑜𝑖𝑖,3 𝑜𝑜𝑖𝑖,4 𝑜𝑜𝑖𝑖,5 𝑜𝑜𝑖𝑖,𝑘𝑘

𝑔𝑔1 𝑔𝑔2 𝑔𝑔3 𝑔𝑔4 𝑔𝑔𝑛𝑛

𝑠𝑠𝑖𝑖,1′ 𝑠𝑠𝑖𝑖,2′ 𝑠𝑠𝑖𝑖,3′ 𝑠𝑠𝑖𝑖,4′ 𝑠𝑠𝑖𝑖,5′ 𝑠𝑠𝑖𝑖,𝑘𝑘′……

𝜆𝜆𝑦𝑦𝑖𝑖,1 𝑦𝑦𝑖𝑖,2 𝑦𝑦𝑖𝑖,3 𝑦𝑦𝑖𝑖,4

𝑔𝑔𝑙𝑙′ 𝑔𝑔2′ 𝑔𝑔1′𝑔𝑔3′

𝑃𝑃𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑌𝑌𝑖𝑖 𝑋𝑋𝑖𝑖 , 𝑆𝑆𝑖𝑖 ,𝐵𝐵𝑖𝑖)

𝑠𝑠𝑖𝑖,𝑙𝑙 𝑠𝑠𝑖𝑖,3 𝑠𝑠𝑖𝑖,1𝑠𝑠𝑖𝑖,2

ℎ1 ℎ2 ℎ3 ℎ4 ℎ𝑚𝑚

𝑥𝑥𝑖𝑖,1 𝑥𝑥𝑖𝑖,2 𝑥𝑥𝑖𝑖,3 𝑥𝑥𝑖𝑖,4 𝑥𝑥𝑖𝑖,𝑚𝑚

……

……

Attention 1𝐶𝐶𝑙𝑙𝑠𝑠𝑠𝑠

……

𝐶𝐶𝑛𝑛𝑛𝑛𝑛𝑛𝐶𝐶𝑛𝑛𝑟𝑟𝑛𝑛𝐶𝐶𝑛𝑛𝑠𝑠𝑡

……

……1 − 𝜆𝜆

Skill knowledge graph

𝑃𝑃𝑔𝑔𝑙𝑙𝑙𝑙𝑔𝑔𝑙𝑙𝑙𝑙 𝑌𝑌𝑖𝑖 𝑋𝑋𝑖𝑖 , 𝑆𝑆𝑖𝑖 ,𝐵𝐵𝑖𝑖)

𝑃𝑃 𝑌𝑌𝑖𝑖 𝑋𝑋𝑖𝑖 , 𝑆𝑆𝑖𝑖 ,𝐵𝐵𝑖𝑖)

𝑃𝑃 𝑆𝑆𝑖𝑖 𝑋𝑋𝑖𝑖 ,𝐵𝐵𝑖𝑖)

Figure 3: An illustration of the architecture of SAMA that consists of three parts, i.e., skill prediction part, skillrefinement part, and job requirement generation part. The skills Si are predicted given the job description. Toconsider the global prior knowledge of skills, the skill knowledge graph gives another set of skills Oi, which playsthe role of refinement. Finally, SAMA fuses multiple attentions to generate the final job requirement paragraph Yi.

.job requirement Yi given a job description Xi andbasic information Bi.

To tackle the JPG task, we propose a global Skill-Aware Multi-Attention model, named SAMA. Fig-ure 3 shows the overall architecture of SAMA.

Firstly, considering the importance of skill pre-diction in JPG, we decompose the probabilityP (Yi|Xi, Bi) into a two-stage generation process,including skill prediction and job requirement para-graph generation:

P (Yi|Xi, Bi) = P (Yi|Xi, Si, Bi)P (Si|Xi, Bi),(1)

where Si = (si,1, si,2, ..., si,l) is a skill2 word se-quence of its corresponding job requirement, lis the length of Si. Since Si and Bi are condi-tionally independent given Xi, we can derive thatP (Si|Xi, Bi) = P (Si|Xi).

Secondly, for refining the skills, we leverage theglobal prior information by the skill knowledgegraph Gs = (E1, R,E2) where E1 and E2 are thesets of head and tail entities and R is the set ofrelations. Given the basic information Bi and theskill knowledge graph Gs, we obtain a set of skillsOi = (oi,1, oi,2, ..., oi,k).

Oi = f(Bi, Gs), (2)

where f is an invertible query function, which canensure the one to one mapping relation between Bi

and Oi.2The details of how skills are extracted are described in

Section 2.

Thirdly, to fuse the local and global informa-tion, the probability P (Yi|Xi, Si, Bi) during thetext generation process is calculated as:

P (Yi|Xi, Si, Bi) = (1− λ)Plocal(Yi|Xi, Si, Bi)

+λPglobal(Yi|Xi, Si, Bi),(3)

where λ is a hyperparameter that adjusts the bal-ance of two probabilities.

3.1 Job Description Encoder

The input job description word sequence Xi is firsttransformed into a sequence of word embeddings.To obtain the long-term dependency vector repre-sentation, we use a bi-directional LSTM (Schusterand Paliwal, 1997) as the text encoder. The in-put sequence is transformed into a hidden statesequence H = (h1, h2, ..., hm) by concatenatingthe representations of the forward and backwardhidden states ht = [

→ht,

←hm−t+1]. Specifically, the

initiated encoder hidden state h0 is a zero vector,and the last encoder hidden state hm is used forinitiating the skill decoder.

3.2 Skill Prediction

Intuitively, the process of skill prediction is a Multi-Label Classification (MLC) task, which aims toassign multiple skills to each job description. Tocapture the correlations between skills, inspired byYang et al. (2018), we view this MLC task as asequence generation problem.

3099

Formally, the skill decoder layer first takes thehidden state hm of the encoder as input, then derivea context vector Cst by an attention mechanism(Luong et al., 2015) to help predict the skill labels.

αji =

exp (g′Tj−1W1hi)∑

i′ exp (g′Tj−1W

1hi′); Cst

j =m∑i=1

αjihi,

(4)where W 1 ∈ Rd×d is trainable weight matrix, dis the hidden vector size. Inspired by Yuan et al.(2018), the job description is labelled with multipleskills by generating a skills sequence which joinsthe skills by delimiter <SEP> and has an unfixednumber of skills (e.g., English <SEP> computerscience <SEP> c++). The skill decoder is basedon LSTM, whose hidden vector is computed by:

g′t = LSTM(g′t−1, Cstt ). (5)

Specifically, the last skill decoder hidden stateg′l is used for initiating the text decoder. The skillsequence is finally obtained by a softmax classifica-tion over the vocabulary of skills, Vskill. In detail,a non-linear transformation is applied to form theskill decoder semantic representation Ist, and thencompute the probability P (Si|Xi, Bi) via:

Istj = tanh(W 2[g′j ;Cstj ])

P (si,j |Xi) = softmaxi(W3Istj + b3),

(6)

where [; ] is vector concatenation, W 2 ∈ Rd×2d,W 3 ∈ R|Vskill|×d and b3 ∈ R|Vskill| are parameters.

3.3 Skill RefinementThe process of skill prediction only considers thelocal information, which results in some misusingof skills. To refine the skill of the generated jobrequirement, the global information is taken intoaccount by the skill knowledge graph.

The skill entities are divided into G and P asdescribed in Section 2. Here, the basic assump-tion is that a generic skill appears more frequentlythan a professional skill among all the job postings,because the professional skill contains more do-main characters. We use a hyperparameter θ as athreshold to divide the skills entities.

Given the basic information Bi = (bpi , bsi ), the

set of skillsOi is obtained from the skill knowledgegraph by the query function f . In detail, firstly, weobtain the set of entities that have the “N.T.M.”relation with bpi and the set of entities who havethe “IN” relation with bsi . Secondly, we get the

intersection of the sets obtained in the first step.Finally, we keep the entities whose types are P.

we embed Oi as S′i = (s′i,1, s′i,2, ..., s

′i,k), and

linearly combine it as a skill graph context vectorCndj by an attention mechanism:

τ ji =exp(gTj−1W

4s′i)∑i′ exp(g

Tj−1W

4s′i′); Cnd

j =k∑

i=1

τ ji s′i,

(7)where W 4 ∈ Rd×d′ are parameters, d′ is the di-mensions of the word embeddings. Then a non-linear transformation is applied to form the graphskill semantic representation Ind. The probabilityPglobal(Yi|Xi, Si, Bi) from Vskill is computed via:

Indj = tanh(W 5[gj ;Cndj ;Crd

j ]), (8)

Pglobal(yi,j = w|Xi, Si, Bi) ={softmaxi(W

6Indj + b6), w ∈ Oi

0, w /∈ Oi,

(9)

where g and Crd will be introduced in next sec-tion, W 5 ∈ Rd×(2d+d′), W 6 ∈ R|Vskill|×d, b6 ∈R|Vskill| are trainable parameters.

3.4 Job Requirement Generation

Job requirement generation fuses multiple atten-tion mechanisms from three aspects, job descrip-tions, predicted skills and skills from skill knowl-edge graph. The text decoder, based on anotherLSTM, aims to generate final word sequence.The hidden vector of text decoder is computedby gt = LSTM(et−1, gt−1), where et−1 is theword embedding of the final generated target wordat time step t − 1. After obtaining g, a non-linear transformation is applied to form the textdecoder semantic representation Ird. The probabil-ity Plocal(Yi|Xi, Si, Bi) is computed via:

Irdj = tanh(W 7[ej−1; gj ;Crdj ;Cth

j ]), (10)

Plocal(yi,j |Xi, Si, Bi) = softmaxi(W8Irdj + b8),

(11)where W 7 ∈ Rd×2(d+d′), W 8 ∈ R|Vtext|×d, b8 ∈R|Vtext| are parameters, Vtext is the vocabulary ofjob requirement and Vskill is a subset of Vtext, bothCrd and Cth are the context vectors generated byattention mechanisms. Specifically, Crd is a con-text vector computed similar as Cst because they

3100

directly take input sequence into account.

βji =exp(gTj−1W

9hi)∑i′ exp(g

Tj−1W

9hi′); Crd

j =m∑i=1

βji hi,

(12)where W 9 ∈ Rd×d.

In addition, the skills S generated by skill de-coder are fed into the text decoder to guide thegeneration process. To obtain Cth, another atten-tion model is leveraged:

γji =exp(gTj−1W

10si)∑i′ exp(g

Tj−1W

10si′); Cth

j =

l∑i=1

γji si,

(13)where W 10 ∈ Rd×d′ are parameters.

The generation probability P (Yi|Xi, Si, Bi) isthe weighted sum of Plocal(Yi|Xi, Si, Bi) andPglobal(Yi|Xi, Si, Bi) as in equation 3. As shownin equation 8 and equation 10, the vector Cth ap-pears explicitly only in Plocal, which implies thatPlocal puts emphasis on the skill prediction, i.e.,the local information, while the vector Cnd ap-pears explicitly only in Pglobal, which indicates thatPglobal focuses on the skills given by skill knowl-edge graph, i.e., the global prior knowledge.

In this way, SAMA considers not only the localinformation from the job description but also theglobal information from the skill knowledge graph.

3.5 Training and InferenceThe loss function of the model has two parts, thenegative log-likelihood of the silver3 skill labels,LS , and the gold4 job requirement text, LY :

LS = −l∑

i=1

logP (S|X,B),

LY = −n∑

i=1

logP (Y |X,S,B),

L = LS + µLY ,

(14)

where µ is a hyperparameter, we give more weightto the loss of gold job requirement. During infer-ence, the outputs of the skill decoder and the textdecoder are predicted as follows:

_

S = argmaxS

P (S|X,B), (15)

3The skill labels are silver standard, because it was notcreated by an expert but extracted by a trained model.

4The job requirement text is gold standard, because it waswritten by human and put out online.

_

Y = argmaxY

P (Y |X,_

S,B). (16)

For each stage, we obtain the best results byutilizing the greedy search at each step.

4 Experiments

In this section, we conduct experiments to verifythe effectiveness of SAMA.

4.1 Experimental Setup4.1.1 DatasetsJob descriptions and job requirements are tokenizedby Pyltp5 word segmenter. Table 1 shows the splitof the dataset. There are 468 position entities, 9scale entities, 31,090 skill entities, and 310,413 re-lation edges in the skill knowledge graph. The vo-cabulary of job descriptions contains 14,189 words,the vocabulary of skills contains 3,523 words, andvocabulary the job requirements contains 18,612words.

4.1.2 Comparison ModelsTo achieve the comprehensive and comparativeanalysis of SAMA, we compared it with two kindsof representative models: the standard generationmodel and the hierarchical generation model.

• S2SA: Seq2Seq with attention (Luong et al.,2015) is a standard generation model.• DelNet: Deliberation networks model (Xia

et al., 2017) is a hierarchical generation modelwhich has a two-pass decoder to generate andpolish the same target sequence.• VPN: Vocabulary pyramid networks (Liu

et al., 2019) is a hierarchical generation modelwhich has the multi-pass encoder and decoderto generate a multi-level target sequence.• SAMA(w/o pred): SAMA(w/o pred) is a de-

graded model of SAMA that removes the pro-cess of skill prediction for the ablation test.

• SAMA(w/o graph): SAMA(w/o graph) is an-other degraded model of SAMA that removesthe process of skill refinement.

4.1.3 Network ConfigurationIn all models, we pretrain word2vec (Mikolov et al.,2013) in the job posting dataset. We set the wordembedding dimension as 100 and the hidden vectorsize as 400 in both encoding and decoding. We set

5https://github.com/HIT-SCIR/pyltp

3101

Models BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-4S2SA 44.78 29.96 20.33 13.11 44.43 20.02 8.87 3.62

DelNet 37.10 25.35 18.28 12.62 44.21 19.29 8.42 3.08VPN 34.15 23.26 16.90 11.68 40.16 16.82 6.96 2.63

SAMA(w/o pred) 44.70 31.59 23.09 16.32 45.87 22.78 11.25 5.75SAMA(w/o graph) 45.49 31.89 23.32 16.40 45.93 22.85 11.37 5.84

SAMA 46.15 32.44 23.77 16.83 46.37 23.27 12.17 6.16

Table 2: Word overlap based metrics.

the maximum number of words in each sequenceof skills and each job requirement as 30 and 150,respectively. Also, the weighted parameters λ andµ are set as 0.5 and 1.4, respectively. The thresholdθ is set as 100. We apply dropout (Zaremba et al.,2014) at a rate of 0.3. Models are trained for 15epochs with the Adam optimizer (Kingma and Ba,2015), and the batch size is 5.

4.1.4 Evaluation MetricsTo evaluate the performance of SAMA, we employthe following metrics:

Word overlap based metrics: To evaluate theoverall text generation quality, we employ BLEU-N(Papineni et al., 2002) and ROUGE-N (Lin, 2004)as evaluation metrics, in which BLEU-N is a kindof precision-based metric and ROUGE-N is a kindof recall-based metric.

Skill prediction metrics: Since the correctnessof generated skills is of great importance in JPG,we further evaluate the quality of skills in gen-erated job requirements, using Precision, Recall,and F1 value. To achieve this, we extract skills inthe ground truth and generated text by a matchingmethod based on the skill vocabulary Vskill.

Human-based evaluation: Since it is difficultto measure the comprehensive quality of the gener-ated texts, i.e., both fluency of the texts and accu-racy of the skills, in addition to automatic metricsabove, we conduct a subjective evaluation follow-ing. Three graduated student volunteers are askedto evaluate the generated paragraphs. We randomlysample 50 pieces of data from the testing set. Thejob requirements generated by different models arepooled and randomly shuffled for each volunteer.Each generated paragraph is evaluated as bad (ir-relevant skills or disfluent sentence), normal (basicrelevant skills and fluent sentence), or good (richand relevant skills and fluent sentence).

4.2 Results and Analysis4.2.1 Overall PerformanceTable 2 shows the results of word overlap basedmetrics. In terms of BLEU-N and ROUGE-N,

0

10

20

30

40

50

60

P R F1

S2SA DelNetVPN SAMA(w/o prediction)SAMA(w/o graph) SAMA

Figure 4: Skill prediction metrics.

SAMA performs the best in all word overlap basedmetrics, which suggests that our model obtainsmore overlapped words with the ground truth.SAMA(w/o graph) and SAMA(w/o pred) obtaincompetitive results, and both are significantly bet-ter than baselines, which demonstrates the effec-tiveness of skill prediction and prior knowledge ofskills, respectively.

In addition to the overall metrics, Figure 4 fur-ther demonstrates the skill-level metrics. Figure4 demonstrates that the job requirements gener-ated by skill aware models (SAMA(w/o pred),SAMA(w/o graph), and SAMA) consist of moreaccurate and richer skills than those generated bythe baselines (S2SA, DelNet, and VPN). Amongthem, SAMA achieves the best performance. Be-sides, SAMA(w/o graph) obtains a higher recallrate, which demonstrates that it can enrich the skillinformation effectively. SAMA(w/o pred) obtainsa higher precision rate, which demonstrates that itcan refine the skill information effectively.

4.2.2 Human-based EvaluationResults of the human-based annotation are shownin Table 3. it can be seen that skill aware modelsobtain more relevant and informative results (goodresults) than the baselines, and SAMA obtains themost “good” results and the least “bad” results.The results are consistent with the automatic met-ric results. S2SA obtains the most “normal” results.This is because S2SA contains less rich and accu-rate skills in job requirements although with a goodfluency. DelNet and VPN obtain a large percentage

3102

Model bad normal good KappaS2SA 0.34 0.50 0.16 0.44

DelNet 0.48 0.34 0.18 0.41VPN 0.56 0.32 0.12 0.38

SAMA(w/o pred) 0.28 0.42 0.30 0.42SAMA(w/o graph) 0.26 0.42 0.32 0.43

SAMA 0.22 0.40 0.38 0.42

Table 3: Human-based evaluation results

interiordesign

EAundergraduate

real-estatedesign-institute

interiordesign

EAundergraduate


interiordesign

EAundergraduate


Figure 5: Visualization. The y-axes represent the skillsof a generated job requirement, and the x-axes of theupper, the lower left and the lower right are the inputjob description X , the recommended skills O from theskill knowledge graph and skills S produced by theskill prediction, respectively.

of “bad” results mainly because of the repeated sen-tences. Besides, SAMA(w/o pred) and SAMA(w/ograph) are both much worse than SAMA on “good”results. This is because SAMA(w/o pred) missessome skills, and SAMA(w/o graph) misuses someskills. All models have the kappa scores around 0.4,indicating that evaluators reach high agreement.

4.2.3 Visualization AnalysisWhen the model generates the target sequence,there exist differences in the contributions of differ-ent words. SAMA can synthetically select the mostinformative words by utilizing the three attentionmechanisms. Figure 56 shows the visualization ofthree attention mechanisms. According to Figure 5,when SAMA generates the skill “EA (Environmen-tal Art)”, it automatically assigns larger weights tomore informative words in three sources, e.g., ‘in-terior’ of X , ‘interior, design, construction, match-ing’ of O, ‘interior, design, drawing, management’of S. It shows that SAMA can consider the differ-ent contributions and capture the most informativewords automatically from multiple sources.

4.2.4 Case StudyTo illustrate the difference in quality betweenSAMA and the compared models, we give an ex-ample of the generated text in Figure 6, where we

6Due to the space limitation, we intercept some texts.

Input:1、负责完成公司下达的年度销售指标。2、将年度指标分解至季度、月度并加以执行。3、确保客户订单及时回款，确保无逾期、呆账等。4、渠道新客户开发及老客户的维护。1. Responsible for completing the annual sales targets issued by thecompany. 2. Decompose annual indicators into quarters and monthsthen implement them. 3. Ensure that orders are repaid timely andensure no overdue or bad debts. 4. Develop new customer and maintainold customers.Gold Output:1、高中以上学历，具备一定的销售经验。2、有礼赠品团购渠道销售经验者优先。3、忠诚度高、服从管理、有团队协作精神。1. High school education above, with some sales experience. 2. Salesexperience in gift group-buying is preferred. 3. High loyalty, obedientmanagement, and teamwork spirit.SAMA Output:1、高中以上学历，1年以上销售经验，有销售运营类管理更加的优先考虑；2、有礼赠品团购终端客户服务体系的工作经验、熟悉礼品销售者优先；3、有团队合作精神，能承受较大的工作压力。1. High school education above, more than 1 year of sales experience,sales management is preferred; 2. Working experience in gift group-buying terminal customer service system, familiar with gift sales arepreferred; 3. Team spirit, can bear high working pressure.S2SA Output:1、高中及以上学历，市场营销等相关专业；2、2年以上销售行业工作经验，有铝艺门窗或建材行业销售经验者优先。1. High school education or above, marketing and other relatedmajors; 2. More than 2 years working experience in sales, salesexperience in aluminum doors and windows or building materialsindustry is preferred.

Figure 6: Case Study. We translate Chinese to En-glish. Skills in bold print are the correct and accurateskills. The underlined skills are the correct but inaccu-rate skills. The italic skills are the incorrect skills.

compare SAMA with the strong baseline S2SA. Asshown in Figure 6, SAMA captures all three as-pects the same as ground truth, while S2SA missesthe third aspect. Besides, in every aspect SAMAgenerates more correct and accurate skills, whileS2SA obviously performs not good enough andgenerates inaccurate skills. Generally, the mainconsideration of job seekers is the skills they needto master, such as Python, English, and Go Lan-guage. Therefore, although S2SA generates someright words, like “preferred”, it does not increasethe quality of the generated text because it gener-ates inaccurate skills.

4.2.5 Parameter AnalysisWe show how the two key hyperparameters ofSAMA, λ and µ, influence the performance in Fig-ure 7. The hyperparameter λ adjusts the balance ofthe probabilities between Plocal and Pglobal and µadjusts the balance between two losses, the loss inskill prediction LS and the loss in job requirementsgeneration LY .

The value of hyperparameter λ varies from 0.1to 0.9 and bigger value implies more global priorknowledge of skills. Figure 7 shows that the per-formance gets a peak when the λ increases. It isintuitive that prior knowledge can help generate

3103

16

16.2

16.4

16.6

16.8

17

17.2

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 243.5

44

44.5

45

45.5

46

46.5

47

BLE

U4

weighted parameter μ

BLE

U1

BLEU-1 BLEU-4 16

16.2

16.4

16.6

16.8

17

17.2

43.5

44

44.5

45

45.5

46

46.5

47

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

BLE

U4

BLE

U1

weighted parameter λ

BLEU-1 BLEU-4

Figure 7: Parameter analysis.

accurate and rich skills. However, the too largevalue may sacrifice the fluency.

The value of hyperparameter µ varies from 1.1to 2.0. We give greater weight to the loss of jobrequirements generation for the reason that it is thetarget of the JPG task. As observed in Figure 7, aweight close to 1 may introduce noises from theskill labels. Besides, when the weight continuouslyincreases close to 2, the model is incapable of fullyconsidering the skill labels.

5 Related Work

The related works fall into two categories, humanresource management and generation models.

5.1 Human Resource ManagementHuman Resource Management (HRM) is an ap-pealing topic for applied researchers, and the re-cruitment is a key part of HRM. With the explosivegrowth of recruiting data, many studies focus on theefficient automatic HRM, e.g., person-organizationfit, intelligent job interview, and job skill ranking.Lee and Brusilovsky (2007) designed a job recom-mender system with considering the preferences ofboth employers and candidates. Qin et al. (2019)proposed a personalized question recommendersystem for job interview to better interview thecandidates. Naim et al. (2015) analyzed the videosof interview for quantifying verbal and nonverbalbehaviors in the context of job interviews. Sun et al.(2019) studied the compatibility of person and or-ganization. Xu et al. (2018) proposed a data drivenapproach for modeling the popularity of job skills.Besides, some augmented writing tools, such asTextio 7 and TapRecruit 8, are developed to assistthe HR to write job postings in the way that assum-ing a draft as input and then polishing the draft.

In this paper, we also consider improving theefficiency of HRM from the perspective of the jobposting writing which is the crucial first step in theprocess of recruitment.

7https://textio.com/products/8https://taprecruit.co/

5.2 Generation Models

Many practical applications are modeled as gener-ation tasks such as keyword extraction, headlinegeneration, and response generation. Many gen-eration tasks are formulated as Seq2Seq learningproblems. Plenty of studies focused on the opti-mization of the Seq2seq model. For example, Lopy-rev (2015) trained a Seq2Seq model with attentionfor headlines generation task. Xing et al. (2017)incorporated topic information into Seq2Seq by ajoint attention mechanism to generate informativeresponses for chatbots. Meng et al. (2017) applieda Seq2seq model with a copy mechanism to a key-word extraction task.

However, models without explicit modeling thesentence planning have a great limitation in gener-ating complex argument structures depending onhierarchy. Dong and Lapata (2018) decomposedthe semantic parsing process into sketch generationand details filled-in and proposed a structure-awareneural architecture. Zhang et al. (2019) formulatedoutline generation task as a hierarchical structuredprediction problem and proposed HiStGen. Pudup-pully et al. (2019) proposed a two-stage modelwhich incorporates content selection and planning,for the data-to-text generation task.

Similar to the above researches, we proposeda hierarchical generation model, namely SAMA,which first labels the job description with multi-ple skills and then generates the job requirementparagraph, to tackle the JPG task. Different fromprior arts, SAMA considered the global informa-tion across the whole dataset to generate high qual-ity job requirements.

6 Conclusion

In this paper, we proposed the job posting genera-tion (JPG) task and formalized it to a conditionaltext generation problem. Besides, we proposeda novel model, SAMA, for this task. The meritsof SAMA come from three aspects. Firstly, it de-composed the long text generation into two stages,including an MLC task and a multiple skills guidedtext generation task. Secondly, it considered boththe local and the global information to generate ac-curate and rich skills. Last but not least, the learnedmapping relationships can be applied to variousdownstream tasks, such as automatic resume, andperson-job fit. Extensive experiments conductedon real-world job posting data demonstrated theeffectiveness and superiority of SAMA.

https://textio.com/products/

https://taprecruit.co/

3104

Acknowledgement

This research was supported by the National Nat-ural Science Foundation of China under grant No.61976119, the Natural Science Foundation of Tian-jin under grant No. 18JCYBJC15800, and the Ma-jor Program of Science and Technology of Tianjinunder grant No. 18ZXZNGX00310.

ReferencesLi Dong and Mirella Lapata. 2018. Coarse-to-fine de-

coding for neural semantic parsing. In Proceedingsof the 56th Annual Meeting of the Association forComputational Linguistics, pages 731–742.

Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirec-tional LSTM-CRF models for sequence tagging.

Diederik P. Kingma and Jimmy Ba. 2015. Adam: Amethod for stochastic optimization. In ICLR.

Danielle H Lee and Peter Brusilovsky. 2007. Fight-ing information overflow with personalized compre-hensive information access: A proactive job recom-mender. In ICAS’07, pages 21–21.

Chin-Yew Lin. 2004. Rouge: A package for automaticevaluation of summaries.

Cao Liu, Shizhu He, Kang Liu, and Jun Zhao. 2019.Vocabulary pyramid network: Multi-pass encodingand decoding with multi-level vocabularies for re-sponse generation. In ACL, pages 3774–3783.

Konstantin Lopyrev. 2015. Generating news headlineswith recurrent neural networks.

Thang Luong, Hieu Pham, and Christopher D. Man-ning. 2015. Effective approaches to attention-basedneural machine translation. In EMNLP, pages 1412–1421.

Rui Meng, Sanqiang Zhao, Shuguang Han, DaqingHe, Peter Brusilovsky, and Yu Chi. 2017. Deepkeyphrase generation. In ACL, pages 582–592.

Tomas Mikolov, Kai Chen, Greg Corrado, and JeffreyDean. 2013. Efficient estimation of word represen-tations in vector space. In 1st International Confer-ence on Learning Representations.

Iftekhar Naim, M Iftekhar Tanveer, Daniel Gildea, andMohammed Ehsan Hoque. 2015. Automated predic-tion and analysis of job interview performance: Therole of what you say and how you say it. In Inter-national Conference and Workshops on AutomaticFace and Gesture Recognition, pages 1–6.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic eval-uation of machine translation. In Proceedings of the40th Annual Meeting of the Association for Compu-tational Linguistics, pages 311–318.

Ratish Puduppully, Li Dong, and Mirella Lapata. 2019.Data-to-text generation with content selection andplanning. In The Thirty-Third AAAI Conference onArtificial Intelligence, pages 6908–6915.

Chuan Qin, Hengshu Zhu, Chen Zhu, Tong Xu, FuzhenZhuang, Chao Ma, Jingshuai Zhang, and Hui Xiong.2019. Duerquiz: A personalized question recom-mender system for intelligent job interview. InKDD, pages 2165–2173.

Mike Schuster and Kuldip K. Paliwal. 1997. Bidirec-tional recurrent neural networks. IEEE Trans. Sig-nal Processing, 45(11):2673–2681.

Dinghan Shen, Asli Celikyilmaz, Yizhe Zhang, LiqunChen, Xin Wang, and Lawrence Carin. 2019.Hierarchically-structured variational autoencodersfor long text generation.

Ying Sun, Fuzhen Zhuang, Hengshu Zhu, Xin Song,Qing He, and Hui Xiong. 2019. The impact ofperson-organization fit on talent management: Astructure-aware convolutional neural network ap-proach. In Proceedings of the 25th ACM SIGKDDInternational Conference on Knowledge Discovery& Data Mining, pages 1625–1633.

Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin,Nenghai Yu, and Tie-Yan Liu. 2017. Deliberationnetworks: Sequence generation beyond one-pass de-coding. In Advances in Neural Information Process-ing Systems, pages 1784–1794.

Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang,Ming Zhou, and Wei-Ying Ma. 2017. Topic awareneural response generation. In Proceedings of theThirty-First AAAI Conference on Artificial Intelli-gence, pages 3351–3357.

Tong Xu, Hengshu Zhu, Chen Zhu, Pan Li, and HuiXiong. 2018. Measuring the popularity of job skillsin recruitment market: A multi-criteria approach. InProceedings of the Thirty-Second AAAI Conferenceon Artificial Intelligence, pages 2572–2579.

Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, WeiWu, and Houfeng Wang. 2018. SGM: sequence gen-eration model for multi-label classification. In Pro-ceedings of the 27th International Conference onComputational Linguistics.

Xingdi Yuan, Tong Wang, Rui Meng, KhushbooThaker, Daqing He, and Adam Trischler. 2018. Gen-erating diverse numbers of diverse keyphrases.

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals.2014. Recurrent neural network regularization.

Min-Ling Zhang and Zhi-Hua Zhou. 2014. A reviewon multi-label learning algorithms. IEEE Trans.Knowl. Data Eng., 26(8):1819–1837.

Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan,and Xueqi Cheng. 2019. Outline generation: Under-standing the inherent content structure of documents.In SIGIR, pages 745–754.

https://doi.org/10.18653/v1/P18-1068

https://doi.org/10.18653/v1/P18-1068

http://arxiv.org/abs/1508.01991




https://doi.org/10.18653/v1/p19-1367

https://doi.org/10.18653/v1/p19-1367

https://doi.org/10.18653/v1/p19-1367



https://doi.org/10.18653/v1/d15-1166

https://doi.org/10.18653/v1/d15-1166

https://doi.org/10.18653/v1/P17-1054

https://doi.org/10.18653/v1/P17-1054



https://www.aclweb.org/anthology/P02-1040/

https://www.aclweb.org/anthology/P02-1040/

https://doi.org/10.1609/aaai.v33i01.33016908

https://doi.org/10.1609/aaai.v33i01.33016908

https://doi.org/10.1145/3292500.3330706

https://doi.org/10.1145/3292500.3330706

https://doi.org/10.1109/78.650093

https://doi.org/10.1109/78.650093

https://openreview.net/forum?id=Hk41X2AqtQ

https://openreview.net/forum?id=Hk41X2AqtQ

https://doi.org/10.1145/3292500.3330849

https://doi.org/10.1145/3292500.3330849

https://doi.org/10.1145/3292500.3330849

https://doi.org/10.1145/3292500.3330849

http://papers.nips.cc/paper/6775-deliberation-networks-sequence-generation-beyond-one-pass-decoding



http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14563

http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14563

https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16215

https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16215

https://www.aclweb.org/anthology/C18-1330/

https://www.aclweb.org/anthology/C18-1330/




https://doi.org/10.1109/TKDE.2013.39

https://doi.org/10.1109/TKDE.2013.39

https://doi.org/10.1145/3331184.3331208

https://doi.org/10.1145/3331184.3331208

Date post:	19-Mar-2022
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Hiring Now: A Skill-Aware Multi-Attention Model for Job ...

Documents