Open Hierarchical Relation Extraction · 2021. 5. 21. · 2 = 1. The relation encoder can be...

Proceedings of the 2021 Conference of the North American Chapter of theAssociation for Computational Linguistics: Human Language Technologies, pages 5682–5693

June 6–11, 2021. ©2021 Association for Computational Linguistics

5682

Open Hierarchical Relation Extraction

Kai Zhang1∗ , Yuan Yao1∗, Ruobing Xie2,Xu Han1, Zhiyuan Liu1† , Fen Lin2, Leyu Lin2, Maosong Sun1

1Department of Computer Science and TechnologyInstitute for Artificial Intelligence, Tsinghua University, Beijing, China

Beijing National Research Center for Information Science and Technology, China2WeChat Search Application Department, Tencent, [email protected], [email protected]

Abstract

Open relation extraction (OpenRE) aims to ex-tract novel relation types from open-domaincorpora, which plays an important role incompleting the relation schemes of knowl-edge bases (KBs). Most OpenRE methodscast different relation types in isolation with-out considering their hierarchical dependency.We argue that OpenRE is inherently in closeconnection with relation hierarchies. To ad-dress the bidirectional connections betweenOpenRE and relation hierarchy, we proposethe task of open hierarchical relation extrac-tion and present a novel OHRE framework forthe task. To effectively integrate hierarchy in-formation into relation representations for bet-ter novel relation extraction, we propose a dy-namic hierarchical triplet objective and hierar-chical curriculum training paradigm. We alsopresent a top-down hierarchy expansion algo-rithm to add the extracted relations into exist-ing hierarchies with reasonable interpretability.Comprehensive experiments show that OHREoutperforms state-of-the-art models by a largemargin on both relation clustering and hierar-chy expansion. The source code and experi-ment details of this paper can be obtained fromhttps://github.com/thunlp/OHRE.

1 Introduction

Open relation extraction (OpenRE) aims to extractnovel relations types between entities from open-domain corpora, which plays an important rolein completing the relation schemes of knowledgebases (KBs). OpenRE models are mainly cate-gorized into two groups, namely tagging-basedand clustering-based methods. Tagging-basedmethods consider OpenRE as a sequence labeling

∗ indicates equal contribution† Corresponding author: Z.Liu ([email protected])

significant person

relative

father

participant

winner

Training instances

…

Representation LearningTest

instances

…

OHRE

child

spouse

RelationClustering

Novel relations

…

Hierarchy Expansion

Figure 1: The workflow of OHRE framework. Trainedwith relation hierarchy and labeled instances, OHREextracts novel relations from open-domain corpora andadds them into the existing hierarchy.

task, which extracts relational phrases from sen-tences (Banko et al., 2007; Cui et al., 2018). Incontrast, clustering-based methods aim to clusterrelation instances into groups based on their se-mantic similarities, and regard each cluster as arelation (Yao et al., 2011; Wu et al., 2019).

However, most OpenRE models cast different re-lation types in isolation, without considering theirrich hierarchical dependencies. Hierarchical orga-nization of relations has been shown to play a cen-tral role in the abstraction and generalization abilityof human (Tenenbaum et al., 2011). This hierar-chical organization of relations also constitutes thefoundation of most modern KBs (Auer et al., 2007;Bollacker et al., 2008). Figure 1 illustrates an ex-ample of relation hierarchy in Wikidata (Vrandecicand Krötzsch, 2014). Such relation hierarchies arecrucial in establishing the relation schemes of KBs,and could also help users better understand andutilize relations in various downstream tasks.

However, manually establishing and maintain-ing the ever-growing relation hierarchies require

https://github.com/thunlp/OHRE

5683

expert knowledge and are time-consuming, giventhe usually large quantity of relations in existinghierarchy and the rapid emergence of novel rela-tions in open domain corpora.1 Since the ultimategoal of OpenRE is to automatically establish andmaintain relation schemes for KBs, it is desirable todevelop OpenRE methods that can directly add theextracted novel relations into the existing incom-plete relation hierarchy. Moreover, incorporatingthe hierarchical information of existing relationscan also help OpenRE methods to model their inter-dependencies. Such refined semantic connectionsamong existing relations can provide transferableguidance to better extract new relations.

Given the inherent bidirectional connections be-tween OpenRE and relation hierarchy, in this work,we aim to introduce relation hierarchy informationto improve OpenRE performance, and directly addthe extracted new relations into the existing hierar-chy, which presents unique challenges. We proposea novel framework OHRE to consider relation hi-erarchy in OpenRE. The key intuition behind ourframework is that distance between relations in hier-archy reflects their semantic similarity. Therefore,nearby relations should share similar representa-tions, and vice versa. Figure 1 shows the frame-work of OHRE, which consists of two components:

(1) In relation representation learning, we de-sign a dynamic hierarchical triplet objective to inte-grate hierarchy information into relation represen-tations. We also present a hierarchical curriculumlearning strategy for progressive and robust train-ing. (2) In relation hierarchy expansion, we firstcluster instances into new relation prototypes andthen conduct a top-down hierarchy expansion algo-rithm to locate new relations into hierarchy. In thisway, OHRE encodes hierarchical information intorelation representations, which improves classicalOpenRE and further enables hierarchy expansion.

To verify the effectiveness of hierarchical infor-mation and the proposed framework, we conductexperiments over two evaluations, including theclassical relation clustering task and a novel hier-archy expansion task. Experimental results on tworeal-world datasets show that our framework canbring significant improvements on the two tasks,even with partially available hierarchy from KBs.

The main contributions of this work are con-cluded as follows: (1) To the best of our knowl-

1E.g., the number of relations in Wikidata has grown tomore than 8, 000 in the last 6 years.

edge, we are the first to address bidirectional con-nections between OpenRE and relation hierarchy.We propose a novel open hierarchical relation ex-traction task, which aims to provide new relationsand their hierarchical structures simultaneously. (2)We present a novel OHRE framework for the pro-posed task, which integrates hierarchical informa-tion into relation representations for better relationclustering, and directly expands existing relationhierarchies with a top-down algorithm. (3) Com-prehensive experiments on two real-world datasetsdemonstrate the effectiveness of OHRE on bothrelation clustering and hierarchy expansion.

2 Related Works

Open Relation Extraction. Recent years havewitnessed an upsurge of interest in open relationextraction (OpenRE) that aims to identify new re-lations in unsupervised data. Existing OpenREmethods can be divided into tagging-based meth-ods and clustering-based methods. Tagging-basedmethods seek to extract surface form of relationalphrases from text in unsupervised (Banko et al.,2007; Banko and Etzioni, 2008), or supervisedparadigms (Angeli et al., 2015; Cui et al., 2018;Stanovsky et al., 2018). However, many relationscannot be explicitly represented as surface forms,and it is hard to align different relational tokenswith the same meanings.

In contrast, traditional clustering-based OpenREmethods extract rich features of sentences and clus-ter features into novel relation types (Lin and Pan-tel, 2001; Yao et al., 2011, 2012; Elsahar et al.,2017). Marcheggiani and Titov (2016) proposediscrete-state variational autoencoder (VAE) thatoptimizes a relation classifier by reconstruction sig-nals. Simon et al. (2019) introduce skewness loss toenable stable training of VAE. Hu et al. (2020) learnrelation representations and clusters iteratively viaself-training. Wu et al. (2019) improve conven-tional unsupervised clustering-based methods bycombining supervised and unsupervised data viasiamese networks, and achieve state-of-the-art per-formance. However, existing OpenRE methodscast different relation types in isolation withoutconsidering their rich hierarchical dependencies.

Hierarchy Information Exploitation. Well-organized taxonomy and hierarchies can facilitatemany downstream tasks. Hierarchical informa-tion derived from concept ontologies can revealsemantic similarity (Leacock and Chodorow, 1998;

5684

Ponzetto and Strube, 2007), and is widely appliedin enhancing classification models (Rousu et al.,2005; Weinberger and Chapelle, 2009) and knowl-edge representation learning models (Hu et al.,2015; Xie et al., 2016). Similar to concept hier-archy, some recent works try to exploit semanticconnections from relation hierarchy. In the fieldof relation extraction, Han et al. (2018a) propose ahierarchical attention scheme to alleviate the noisein distant supervision. Zhang et al. (2019) leverageimplicit hierarchical knowledge from KBs and pro-pose coarse-to-fine grained attention for long-tailrelations. However, these methods are designedto identify pre-defined relations, and cannot be ap-plied to OpenRE that aims to discover novel rela-tions in open-domain corpora.

3 OHRE Framework

We divide the open hierarchical relation extractionproblem into two phases: (1) learning relation rep-resentations with hierarchical information and (2)clustering and linking novel relations to existinghierarchies.

3.1 Relation Representation Learning

Learning relation representation is fundamental toopen hierarchical relation extraction. We encodesentences into relation representations using a re-lation embedding encoder. We assume existingrelations are organized in hierarchies, which is com-mon in most modern KBs. Note that while Figure 1shows one hierarchy tree, the relation hierarchiesmay contain multiple trees. To fully utilize hierar-chy information, we design a dynamic hierarchicaltriplet objective that integrates hierarchy informa-tion into relation representations, and hierarchicalcurriculum learning for robust model training. Pair-wise virtual adversarial training is also introducedto improve the representation generalization ability.

Relation Embedding Encoder. We adopt CNN toencode sentences into relation representations. Fol-lowing previous works (Zeng et al., 2014), givena sentence s and target entity pair (eh, et), eachword in the sentence is first transformed into in-put representations by the concatenation of wordembedding and position embedding indicating theposition of each entity. Then the input representa-tion is fed into a convolutional layer followed by amax-pooling layer and a fully-connected layer toobtain the relation representation v ∈ Rd. The rela-tion representation is normalized by L2 norm, i.e.,

RelationEmbedding

Encoder

Curriculum Learning Dynamic Margin!!"!

!

"

#

$ = 23

$ = 45

*"! *#!

*"$ *#$

layer 1

layer 2

#!

!$"$#$

!!"!#!

!$"$#$

Figure 2: OHRE samples triplets from relations in hi-erarchy following a shallow-to-deep paradigm and setsdynamic margin via relation distance in hierarchy.

‖v‖2 = 1. The relation encoder can be denoted as:

v = CNN(s, eh, et). (1)

After obtaining relation representations, we mea-sure the similarity of two relation instances by theEuclidean distance between their representations:

d(v1,v2) = ‖v1 − v2‖22. (2)

Dynamic Hierarchical Triplet Loss. To effec-tively integrate relation hierarchy information intorelation representations, we propose a dynamic hi-erarchical triplet loss for instance representationlearning. Triplet loss is widely used in metric learn-ing that encourages a static margin between differ-ent categories for distinguishment (Schroff et al.,2015). We argue that good relation representa-tions should also reflect hierarchical information,where relations with close semantics in hierarchyshould share similar representations. As the exam-ple shown in Figure 2, r1i and r1j should be closerthan r2i and r2j in representation space, since r1i andr1j are close to each other in the relation hierarchy.

We design a hierarchical triplet objective witha dynamic margin which is determined by the dis-tance between relations in hierarchy. Specifically,the dynamic margin is conducted over the instancesof the relations. As shown in Figure 2, given tworelations ri and rj sampled by hierarchical curricu-lum training strategy (which will be introducedlater), we randomly sample two instances (namelyanchor instance a and positive instance p) from ri,and an instance (namely negative instance n) fromrj . The hierarchical triplet objective requires modelto distinguish the positive pair (a, p) from the neg-ative pair (a, n) by a distance margin, which is dy-namically determined by the length of the shortest

5685

path between ri and rj in the hierarchy as follows:

Lt =∑

ri,rj∼T

max[0, d(va,vp)

+ λdl(ri, rj)

1 + l(ri, rj)− d(va,vn)],

(3)

where λd is a hyperparameter, l(ri, rj) is the lengthof the shortest path between ri and rj in the hier-archy,2 and T is the curriculum training strategythat will be introduced later. Intuitively, the marginincreases with the length of the shortest path in thehierarchy, with a relative emphasis on distinguish-ing nearby relations. Compared to the static marginin vanilla triplet loss, dynamic hierarchical margincan capture the semantic similarities of relations inthe hierarchy, leading to representations that canserve not only novel relation clustering but alsoeffective relation hierarchy expansion.Hierarchical Curriculum Learning. In additionto providing direct supervision for representationlearning, relation hierarchy can also be useful inproviding signals for robust model training. Wepropose a hierarchical training paradigm, which isa curriculum learning strategy (Bengio et al., 2009)that enables progressive training. The motivation isintuitive: In the early period of training, we chooserelations that are easy to distinguish by the model,and gradually transfer to harder ones. Specifically,we sample two relations from the same layer inhierarchy that share ancestor relations (i.e., the re-lations come from the same tree and are of the samedepth), with a gradual transition from shallow todeep layers with respect to their common ancestor,as shown in Figure 2. The training procedure willlead the model to learn relations from coarse to finegrains, since the length of the shortest path betweentwo relations in hierarchy gradually increases asthe relation pair goes deeper.3 In experiments, wefind it beneficial to warm-up the training of OHREunder the hierarchical training paradigm, and thenswitch to two random relations in the later phase.Pair-wise Virtual Adversarial Training. Neuralmetric learning models may suffer from the over-fitting problem by learning very complex decisionhyperplanes. In our case, the problem is severesince relation hierarchies provide strong supervi-sion to metric learning. To address this issue, we

2The margin is 1 if two relations come from different trees.3Relations with longer shortest paths are more difficult

to the model, since they need to be distinguished by largermargins, as indicated in Equation 3.

design pair-wise virtual adversarial training thatsmooths the representation space by penalizingsharp changes in the space. Specifically, for eachrandomly sampled instance pair, we add worst-caseperturbations, such that the distance between therelation pairs reaches the maximum changes. Wepenalize the loss changes as follows:

Lv =∑v1,v2

‖d(v1,v2)− d(v1, v2)‖22, (4)

where v is obtained by adding the worst-case noiseto v. Pair-wise virtual adversarial training encour-ages smooth and robust metric space, thus improv-ing the generalization ability of OpenRE models.Unlike previous works that adopt virtual adversar-ial training in classification problems (Miyato et al.,2017; Wu et al., 2019), our pair-wise virtual adver-sarial training is based on distance in Euclideanspace instead of classification probability distribu-tions. We refer readers to the appendix for moredetails about the pair-wise virtual adversarial train-ing. The final loss is defined as the addition ofdynamic hierarchical triplet loss Lt and pair-wisevirtual adversarial loss Lv:

L = Lt + λvLv, (5)

where λv is a hyperparameter.

3.2 Relation Hierarchy ExpansionTo expand the existing relation hierarchies, wefirst cluster novel relations in open-domain cor-pora based on instance representations, and thenlearn relation prototypes for both relations in theexisting hierarchy and novel relations. Finally, newrelations are inserted into the existing relation hi-erarchy by a novel top-down hierarchy expansionalgorithm based on relation prototypes.

The hierarchy expansion framework is designedbased on two key assumptions: (1) A relation pro-totype is the aggregation of all instances belongingto itself and descendant relations. (2) A relationprototype has the highest similarity with its parentrelation prototype, and a lower similarity with itssibling relation prototypes. The rationale of theassumptions is that the semantics of a relation istypically covered by its ancestors. The assump-tion is also aligned with the intuition in relationrepresentation learning, where a relation exhibitsthe highest similarity with its parent, due to theminimum shortest path length (i.e., the length is 1).Relation Prototype Learning. We first clusternew relations in unsupervised data by Louvain algo-

5686

rithm (Blondel et al., 2008). Louvain detects com-munities in a graph by greedily merging data pointsto clusters based on modularity optimization, andhas proven effective in OpenRE (Wu et al., 2019).We construct a weighted undirected graph of therelation instances in the test set, where the connec-tion weight between two instances is determinedby the distance between their representations:

w(v1,v2) = max[0, 1− d(v1,v2))]. (6)

In experiments, we observe that clusters contain-ing very few instances are typically noisy outliersand are not proper to be regarded as novel relations,which is consistent with Wu et al. (2019). There-fore, we merge instances in these clusters into theirclosest clusters, measured by the highest connec-tion weight. Then we learn relation prototypes forboth relations in the existing hierarchy and novelrelations based on the clusters. We represent eachrelation prototype with instances, where the proto-type of a novel relation consists of all its instances,and the prototype of an existing relation contains allinstances from itself and all descendant relations.

Top-Down Hierarchy Expansion. After obtain-ing relation prototypes, we link these extracted re-lations to existing hierarchy by a novel top-down hi-erarchy expansion algorithm. Following the afore-mentioned assumptions, for each novel relation, thealgorithm finds its parent with the highest similarityin a top-down paradigm.

Specifically, for each novel relation, startingfrom the existing root relations, we iterativelysearch the relation with the highest similarity incandidates layer by layer. In each layer, the searchcandidates are obtained by the child relations ofthe search result in the previous layer. The searchprocess terminates if the similarity decreases com-pared to the previous layer. The extracted relationwill be inserted as the child of the most similar rela-tion, or cast as a singleton if the highest similarityis lower than a threshold, where a higher expansionthreshold will lead to more singleton relations. Theprocedure is shown in Algorithm 1, and we referreaders to experiments for a detailed example. Inpractice, the similarity between a novel relationand an existing relation is given by the averageconnection between their prototypes as follows:

S(ri, rj) =

∑v1∈Pi

∑v2∈Pj

w(v1,v2)

|Pi| · |Pj |·√

1 + |P sj |, (7)

Algorithm 1 Top-Down Hierarchy ExpansionRequire: r: A novel relationRequire: λW : Expansion threshold1: Init search candidates C = root relations of trees2: Init highest similarity in previous layer W = 03: while C not empty do4: Search relation c = argmax

c∈CS(r, c)

5: if S(r, c) > W then6: // Move to the next layer7: Update highest similarity W = S(r, c)8: Update search candidates C = children of c9: else

10: Stop searching11: if W ≥ λW then12: Expand r as child of c13: else14: Cast r as singleton relation

where ri is a novel relation and rj is an existingrelation, Pi and Pj are the corresponding relationprototypes, and |P s

j | refers to the number of alldescendant relations of rj . In experiments, wefind that relations containing more descendant re-lations in hierarchy tend to exhibit lower averageconnections with novel relations, due to the mar-gins between the contained descendant relations.By introducing

√1 + |P s

j |, we balance the connec-tion strength and encourage the model to explorewider and deeper hierarchies.

The reason for expanding hierarchy with a top-down paradigm is threefold: (1) The coarse-to-fine-grained hierarchy expansion procedure is bio-plausible, as suggested by cognitive neurosciencestudies (Tenenbaum et al., 2011). (2) The decisionmaking procedure following the existing hierarchystructure is interpretable. (3) It can achieve betterefficiency since the unlikely branches are prunedin the early search stage.

4 Experiments

To verify the effectiveness of hierarchical informa-tion and OHRE, we conduct comprehensive exper-iments on relation clustering and hierarchy expan-sion on two real-world datasets. We also conducta detailed analysis of OHRE to provide a betterunderstanding of our framework. We refer readersto the appendix for more implementation details.

4.1 DatasetFollowing previous works (Wu et al., 2019; Huet al., 2020), we evaluate our framework onFewRel (Han et al., 2018b) and New York TimesFreebase (NYT-FB) dataset (Marcheggiani and

5687

Titov, 2016). However, the original random datasplits are not suitable to benchmark open hierar-chical relation extraction task, since the test setsdo not well cover different topologies in relationhierarchy. In the test sets, for a majority of rela-tions, their parent relations are not labeled withsentences in the dataset, making them singletonrelations. It is desirable to include more diverseand challenging relations with complex topologiesin the test sets. Thus we re-split these two datasetsto better approximate and provide benchmarks forreal-world needs. Considering applications whereonly incomplete relation hierarchies are available,we only use partial hierarchy from KBs, removingthe hierarchy of relations beyond the train sets.FewRel Hierarchy. FewRel (Han et al., 2018b) isa supervised dataset created from Wikipedia andWikidata. Following Wu et al. (2019), the train setincludes 64 relations where each relation has 700instances. The development set and test set share16 relations, and each set has 1, 600 instances. Weexchange relations from the original train and testset to include three relation typologies in test set:(1) single relation without a parent (6 relations), (2)relation with a parent in train set (8 relations), and(3) relation with a parent in test set (2 relations).We call this dataset FewRel Hierarchy.NYT-FB Hierarchy. NYT-FB (Marcheggiani andTitov, 2016) is a distantly supervised dataset cre-ated from New York Times and Freebase. Follow-ing Simon et al. (2019), we filter out sentenceswith non-binary relations. The train set includes212 relations with 33, 992 instances. The develop-ment set and test set share 50 relations, and have3, 835 and 3, 858 instances respectively. Each rela-tion in development set and test set has at least 10instances. We call this dataset NYT-FB Hierarchy.

4.2 Experimental SettingsWe introduce two task settings and correspond-ing evaluation metrics. (1) Relation clustering set-ting is widely adopted in previous OpenRE worksto evaluate the ability of clustering novel rela-tions (Marcheggiani and Titov, 2016; Wu et al.,2019). (2) We also design the hierarchy expansionsetting to thoroughly test the ability of OpenREmodels in expanding existing relation hierarchies.

4.2.1 Relation Clustering SettingRelation clustering is a traditional OpenRE setting,where models are required to cluster instances intodifferent groups representing new relations.

Baselines. We compare OHRE with state-of-the-art OpenRE baselines. (1) Relational SiameseNetwork augmented with conditional entropy andvirtual adversarial training (RSN-CV) (Wu et al.,2019) is the state-of-the-art OpenRE method thattransfers relational knowledge from labeled datato discover relations in unlabeled data. (2) Self-ORE (Hu et al., 2020) utilizes self-training to iter-atively learn relation representations and clusters.(3) HAC with re-weighted word embeddings (RW-HAC) (Elsahar et al., 2017) is the state-of-the-artrich feature-based method. RW-HAC first extractsrich features, such as entity types, then reducesfeature dimension via principal component analy-sis, and finally clusters the features with HAC. (4)Discrete-state variational autoencoder (VAE) (Elsa-har et al., 2017) optimizes a relations classifier viareconstruction signals, with rich features includingdependency paths and POS tags.

Evaluation Metrics. Following Wu et al. (2019);Hu et al. (2020), we adopt instance-level evaluationmetrics to evaluate relation clustering, includingB3 (Bagga and Baldwin, 1998), V-measure (Rosen-berg and Hirschberg, 2007) and Adjusted RandIndex (ARI) (Hubert and Arabie, 1985). We referreaders to the appendix for more detailed descrip-tions about the evaluation metrics.

4.2.2 Hierarchy Expansion SettingIn this setting, models are required to first clusternovel relations, and then further add the extractedrelations into the existing hierarchy in train set.

Baselines. To the best of our knowledge, thereare no existing OpenRE methods designed to di-rectly expand an existing relation hierarchy. Wedesign two strong baselines based on state-of-the-art OpenRE architectures. (1) RW-HAC for hier-archy expansion (RW-HAC-HE) links each novelrelation cluster given by RW-HAC to the existingrelation cluster with the global highest the Ward’slinkage score. The novel relation will be a single-ton if the highest score is less than a threshold. (2)RSN-CV for hierarchy expansion (RSN-CV-HE)obtains clusters using RSN-CV, and links them tothe hierarchy using our top-down expansion algo-rithm. Here without confusion, we omit the -HEsuffixes in model names in the experiment results.

Evaluation Metrics. We adopt two metrics to eval-uate on cluster-level (1) how well a predicted clus-ter matches the golden cluster by matching met-ric (Larsen and Aone, 1999), and (2) how well

5688

Dataset Model B3 V-measure ARIF1 Prec. Rec. F1 Hom. Comp.

FewRelHierarchy

VAE (Marcheggiani and Titov, 2016) 23.0 14.2 61.4 24.1 17.7 37.9 4.9RW-HAC (Elsahar et al., 2017) 32.7 28.0 39.4 39.7 36.0 44.4 12.4SelfORE (Hu et al., 2020) 60.6 60.1 61.1 70.1 69.5 70.7 54.6RSN-CV (Wu et al., 2019) 63.8 57.4 71.7 72.4 68.9 76.2 54.2OHRE 70.5 64.5 77.7 76.7 73.8 79.9 64.2

NYT-FBHierarchy

VAE (Marcheggiani and Titov, 2016) 25.2 17.6 44.4 35.1 28.2 46.3 10.5RW-HAC (Elsahar et al., 2017) 35.0 43.3 29.4 58.9 61.7 56.3 28.3SelfORE (Hu et al., 2020) 38.1 42.6 34.5 59.0 60.7 57.5 30.4RSN-CV (Wu et al., 2019) 38.9 26.3 74.2 44.1 74.3 55.4 26.2OHRE 43.8 31.4 72.3 60.0 49.9 75.3 31.9

Table 1: Relation clustering results on two datasets (%).

Model B3 F1 V-F1 ARI

RSN-CV 63.8 72.4 54.2w/o VAT 53.3 65.0 43.2

OHRE 70.5 76.7 64.2w/o Dynamic Margin 68.9 76.1 63.5w/o Curriculum Train 68.5 75.7 62.1w/o Pair-wise VAT 58.3 68.8 49.5

Table 2: Ablation results on FewRel Hierarchy (%).

the predicted cluster links to the golden positionin hierarchy by taxonomy metric (Dellschaft andStaab, 2006). We also report two overall evalua-tion metrics that consider both relation clusteringand hierarchy expansion results. Specifically, wereport the arithmetic mean and harmonic mean ofmatching F1 and taxonomy F1.

4.3 Relation Clustering Results

Main Results. Table 1 shows relation clustering re-sults on two datasets, from which we observe that:(1) OHRE outperforms state-of-the-art models by alarge margin, e.g., with 6.7%, 4.3%, 9.6% improve-ments in B3, V-measure, and ARI respectively onFewRel Hierarchy. Compared with unsupervisedmethods, the performance gap is even greater, e.g.,more than 30% in B3 on FewRel Hierarchy. Thisshows that OHRE can effectively leverage existingrelation hierarchy for better novel relation cluster-ing. (2) The improvements of OHRE are consistentin both supervised FewRel Hierarchy dataset anddistantly supervised NYT-FB Hierarchy dataset.This indicates that the representation learning andrelation clustering procedure of OHRE is robust tonoisy relation labels and long-tail relations in dif-ferent domains. We note that although our modeladopts CNN as the relation encoder, it outperformsSelfORE equipped with BERT (Devlin et al., 2019).

We expect it would be beneficial to enhance the re-lation representations in OHRE with pre-trainedlanguage models, and we leave it for future work.

Ablation Study. We conduct ablations to investi-gate the contribution of different components, asshown in Table 2. For fair comparisons, we also ab-late virtual adversarial training from RSN-CV (Wuet al., 2019). Experimental results show that allcomponents contribute to the final performance.This shows that hierarchical information from exist-ing relations can provide transferable guidance fornovel relation clustering. The performance dropsmost significantly when removing pair-wise virtualadversarial training, indicating the importance ofspace smoothing to the generalization of OHRE.

4.4 Hierarchy Expansion Results

Main Results. Table 3 shows the results of hier-archy expansion, from which we observe that: (1)OHRE outperforms strong baselines on hierarchyexpansion. Compared to baselines, OHRE achieveshigher match F1, which indicates that relations ex-tracted by OHRE can be better aligned with goldenrelations on cluster-level. Moreover, the advan-tage in taxonomy F1 shows that OHRE can betteradd the extracted relations in the existing hierarchy.The reasonable overall result shows the potentialof OHRE in real-world open hierarchical relationextraction applications. (2) We also conduct hi-erarchy expansion experiments with golden novelclusters. However, experiment results show no ob-vious improvements for all models. Particularly,we note that while RW-HAC and RSN-CV achieveseemingly reasonable performance, they alwayscast novel relation as a singleton and are unableto add the relation to the right place in hierarchy.4

4The proportion of singleton relations is 37.5%.

5689

creator

author

architect

developer

director

designed by

composer

0.415Step 1

0.439Step 2

screenwriter0.260 0.247

(a) The expansion recommendation by OHREon an existing relation hierarchy. The averageconnection score at each time step is shown.

composer

screenwriter

director

developer

architect

(b) OHRE first clusters novel re-lations from open-domain corpora,and learns relation prototypes.

composer creatorStep 1: 0.415

authorStep 2: 0.439

(c) OHRE then expands relationhierarchy based on relation proto-types in a top-down paradigm.

Figure 3: OHRE workflow in expanding an existing hierarchy with novel relations, and t-SNE visualization onFewRel Hierarchy. Relations with labeled instances in the dataset are marked in color. Relations in existinghierarchy are marked with solid lines, and novel relations are marked with dashed lines. Best viewed in color.

Dataset Method GoldenCluster

Match Taxonomy Arith. Harm.F1 Prec. Rec. F1 Prec. Rec. F1 F1

FewRelHierarchy

RW-HAC 33.2 33.9 37.6 37.5 37.5 37.5 35.3 35.2RSN-CV 69.6 63.7 85.8 34.5 38.5 31.3 52.0 46.1OHRE 78.5 73.6 88.4 53.3 57.1 50.0 65.9 63.5

RW-HACX N/A

43.8 43.8 43.8 71.9 60.9RSN-CV 37.5 37.5 37.5 68.8 54.5OHRE 57.4 62.5 53.1 78.7 73.0

NYT-FBHierarchy

RW-HAC 29.6 34.3 34.0 10.1 8.7 12.0 19.8 15.0RSN-CV 45.1 33.2 83.1 10.5 15.2 8.0 27.8 17.0OHRE 51.7 42.7 76.2 22.3 23.9 21.0 37.0 31.2

RW-HACX N/A

20.0 16.7 25.0 60.0 33.3RSN-CV 13.0 16.0 11.0 56.5 23.1OHRE 23.0 23.0 23.0 61.5 37.4

Table 3: Hierarchy expansion results. Golden cluster indicates the golden relation clusters are given, in which casematching metric for relation clustering is not applicable. Arith: arithmetic mean, Harm: harmonic mean.

Relation Clustering Hierarchy Expansionsgl. p-trn. p-tst. sgl. p-trn. p-tst.

RW-HAC 31.6 35.0 42.8 60.0 0.0 0.0RSN-CV 67.1 77.8 64.4 58.8 0.0 0.0OHRE 75.2 84.6 53.9 58.8 36.4 0.0

Table 4: Relation clustering (B3 F1) and hierarchy ex-pansion (Taxonomy F1) results on relations in differenthierarchy topologies. sgl.: relations without a parent,p-trn.: parent in train set, p-tst.: parent in test set.

This is because the inconsistent instance represen-tations within each golden cluster will mislead theexpansion procedure on cluster-level, which showsintegrating hierarchy information into relation rep-resentations is of fundamental importance to hier-archy expansion. Besides, the results also show thenecessity of re-splitting FewRel to include more hi-erarchy topologies in test set for better benchmark.

Zoom-in Study. To better understand the perfor-mance of models on hierarchy expansion, we divide

the relations according to their hierarchy topologiesand report the performance on FewRel Hierarchy.Table 4 shows the results on three topologies, in-cluding (1) single relations without parents (sgl.),(2) relations with parents in train set (p-trn.), and(3) relations with parents in test set (p-tst.). The re-sults show that although models achieve reasonableperformance on clustering in all three topologies,they struggle on hierarchy expansion, especiallyon relations with parents. In comparison, OHREcan handle some relations with parents in train set.However, there is still ample room for improve-ment. This shows hierarchy expansion is challeng-ing, and we leave further research for future work.

4.5 Case StudyTo intuitively show how OHRE expands an existinghierarchy with novel relations from open-domaincorpora, we visualize the workflow of OHRE on re-lation composer, as shown in Figure 3. The averageconnection score increases as the expansion proce-

5690

dure progress from top to down in hierarchy. Theexpansion procedure terminates when the connec-tion score decreases. The process is not only betteraligned with real-world needs, but also providesbetter interpretability in decision making.

5 Conclusion

In this work, we make the first attempt to addressbidirectional connections between OpenRE andrelation hierarchy. In the future, we believe thefollowing directions worth exploring: (1) We use aheuristic method to add new relations into hierar-chies based on local similarities between relations.In future, more advanced methods can be designedto model the global interaction between new rela-tions and hierarchy, and learn to effectively add thenovel relations. (2) We conduct relation representa-tion learning and hierarchy expansion in a pipeline.In the future, end-to-end models can be developedto jointly optimize these important phases for betteropen hierarchical relation extraction results.

6 Acknowledgement

This work is supported by the National Key Re-search and Development Program of China (No.2020AAA0106501). Yao is also supported by 2020Tencent Rhino-Bird Elite Training Program.

ReferencesGabor Angeli, Melvin Jose Johnson Premkumar, and

Christopher D. Manning. 2015. Leveraging linguis-tic structure for open domain information extrac-tion. In Proceedings of ACL-IJCNLP, pages 344–354. ACL.

Sören Auer, Christian Bizer, Georgi Kobilarov, JensLehmann, Richard Cyganiak, and Zachary Ives.2007. Dbpedia: A nucleus for a web of open data.In The semantic web, pages 722–735. Springer.

Amit Bagga and Breck Baldwin. 1998. Entity-based cross-document coreferencing using the vec-tor space model. In Proceedings of ACL-COLING,pages 79–85. ACL.

Michele Banko, Michael J. Cafarella, Stephen Soder-land, Matthew Broadhead, and Oren Etzioni. 2007.Open information extraction from the web. In Pro-ceedings of IJCAI, pages 2670–2676. ACM.

Michele Banko and Oren Etzioni. 2008. The tradeoffsbetween open and traditional relation extraction. InProceedings of ACL: HLT, pages 28–36. ACL.

Yoshua Bengio, Jérôme Louradour, Ronan Collobert,and Jason Weston. 2009. Curriculum learning. InProceedings of ICML, page 41–48. ACM.

Vincent D Blondel, Jean-Loup Guillaume, RenaudLambiotte, and Etienne Lefebvre. 2008. Fast un-folding of communities in large networks. Jour-nal of statistical mechanics: theory and experiment,2008(10):P10008.

Kurt Bollacker, Colin Evans, Praveen Paritosh, TimSturge, and Jamie Taylor. 2008. Freebase: a collab-oratively created graph database for structuring hu-man knowledge. In Proceedings of the 2008 ACMSIGMOD international conference on Managementof data, pages 1247–1250.

Lei Cui, Furu Wei, and Ming Zhou. 2018. Neuralopen information extraction. In Proceedings of ACL,pages 407–413. ACL.

Klaas Dellschaft and Steffen Staab. 2006. On how toperform a gold standard based evaluation of ontol-ogy learning. In Proceedings of ISWC, pages 228–241. Springer.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. BERT: Pre-training ofdeep bidirectional transformers for language under-standing. In Proceedings of NAACL: HLT, pages4171–4186. ACL.

Hady Elsahar, Elena Demidova, Simon Gottschalk,Christophe Gravier, and Frederique Laforest. 2017.Unsupervised open relation extraction. In Proceed-ings of ESWC, pages 12–16. Springer.

Xu Han, Pengfei Yu, Zhiyuan Liu, Maosong Sun, andPeng Li. 2018a. Hierarchical relation extractionwith coarse-to-fine grained attention. In Proceed-ings of EMNLP, pages 2236–2245. ACL.

Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao,Zhiyuan Liu, and Maosong Sun. 2018b. FewRel:A large-scale supervised few-shot relation classifica-tion dataset with state-of-the-art evaluation. In Pro-ceedings of EMNLP, pages 4803–4809. ACL.

Xuming Hu, Lijie Wen, Yusong Xu, Chenwei Zhang,and Philip Yu. 2020. SelfORE: Self-supervised re-lational feature learning for open relation extraction.In Proceedings of EMNLP, pages 3673–3682. ACL.

Zhiting Hu, Poyao Huang, Yuntian Deng, Yingkai Gao,and Eric Xing. 2015. Entity hierarchy embedding.In Proceedings of ACL-IJCNLP, pages 1292–1300.ACL.

Lawrence Hubert and Phipps Arabie. 1985. Compar-ing partitions. Journal of classification, 2(1):193–218.

Bjornar Larsen and Chinatsu Aone. 1999. Fast and ef-fective text mining using linear-time document clus-tering. In Proceedings of KDD, page 16–22, NewYork, NY, USA. ACM.

Claudia Leacock and Martin Chodorow. 1998. Com-bining local context and wordnet similarity for wordsense identification. WordNet: An electronic lexicaldatabase, 49(2):265–283.

https://doi.org/10.3115/v1/P15-1034

https://doi.org/10.3115/v1/P15-1034

https://doi.org/10.3115/v1/P15-1034

https://doi.org/10.1007/978-3-540-76298-0_52

https://www.aclweb.org/anthology/P98-1012



http://ijcai.org/Proceedings/07/Papers/429.pdf



https://doi.org/10.1145/1553374.1553380

https://iopscience.iop.org/article/10.1088/1742-5468/2008/10/P10008/pdf

https://iopscience.iop.org/article/10.1088/1742-5468/2008/10/P10008/pdf

https://doi.org/10.1145/1376616.1376746

https://doi.org/10.1145/1376616.1376746

https://doi.org/10.1145/1376616.1376746

https://doi.org/10.18653/v1/P18-2065

https://doi.org/10.18653/v1/P18-2065

https://link.springer.com/chapter/10.1007/11926078_17



https://doi.org/10.18653/v1/N19-1423

https://doi.org/10.18653/v1/N19-1423

https://doi.org/10.18653/v1/N19-1423

https://link.springer.com/chapter/10.1007%2F978-3-319-70407-4_3

https://www.aclweb.org/anthology/D18-1247


https://doi.org/10.18653/v1/D18-1514

https://doi.org/10.18653/v1/D18-1514

https://doi.org/10.18653/v1/D18-1514

https://www.aclweb.org/anthology/2020.emnlp-main.299

https://www.aclweb.org/anthology/2020.emnlp-main.299


https://link.springer.com/article/10.1007/BF01908075

https://link.springer.com/article/10.1007/BF01908075

https://doi.org/10.1145/312129.312186

https://doi.org/10.1145/312129.312186

https://doi.org/10.1145/312129.312186

https://www.researchgate.net/profile/Claudia_Leacock/publication/200045856_Combining_Local_Context_and_WordNet_Similarity_for_Word_Sense_Identification/links/542b32510cf277d58e8a1413/Combining-Local-Context-and-WordNet-Similarity-for-Word-Sense-Identification.pdf



5691

Dekang Lin and Patrick Pantel. 2001. Dirt@sbt@discovery of inference rules from text. InProceedings of KDD. ACM Press.

Diego Marcheggiani and Ivan Titov. 2016. Discrete-state variational autoencoders for joint discovery andfactorization of relations. TACL, 4:231–244.

Takeru Miyato, Andrew M. Dai, and Ian J. Goodfel-low. 2017. Adversarial training methods for semi-supervised text classification. In Proceedings ofICLR.

Jeffrey Pennington, Richard Socher, and Christopher DManning. 2014. GloVe: Global vectors for wordrepresentation. In Proceedings of EMNLP, pages1532–1543. ACL.

Simone Paolo Ponzetto and Michael Strube. 2007.Knowledge derived from wikipedia for computingsemantic relatedness. JAIR, 30:181–212.

Andrew Rosenberg and Julia Hirschberg. 2007. V-measure: A conditional entropy-based external clus-ter evaluation measure. In Proceedings of EMNLP-CoNLL, pages 410–420. ACL.

Juho Rousu, Craig Saunders, Sandor Szedmak, andJohn Shawe-Taylor. 2005. Learning hierarchicalmulti-category text classification models. In Pro-ceedings of ICML, pages 744–751. ACM.

Florian Schroff, Dmitry Kalenichenko, and JamesPhilbin. 2015. Facenet: A unified embedding forface recognition and clustering. In Proceedings ofCVPR, pages 815–823. IEEE Computer Society.

Étienne Simon, Vincent Guigue, and Benjamin Pi-wowarski. 2019. Unsupervised information extrac-tion: Regularizing discriminative approaches withrelation distribution losses. In Proceedings of ACL,pages 1378–1387. ACL.

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky,Ilya Sutskever, and Ruslan Salakhutdinov. 2014.Dropout: A simple way to prevent neural networksfrom overfitting. JMLR, 15(56):1929–1958.

Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer,and Ido Dagan. 2018. Supervised open informationextraction. In Proceedings of NAACL: HLT, pages885–895. ACL.

Joshua B. Tenenbaum, Charles Kemp, Thomas L. Grif-fiths, and Noah D. Goodman. 2011. How to grow amind: Statistics, structure, and abstraction. Science,331(6022):1279–1285.

Denny Vrandecic and Markus Krötzsch. 2014. Wiki-data: a free collaborative knowledgebase. Commu-nications of the ACM, 57:78–85.

Kilian Q Weinberger and Olivier Chapelle. 2009.Large margin taxonomy embedding for documentcategorization. In Advances in NeurIPS, pages1737–1744. Curran Associates, Inc.

Ruidong Wu, Yuan Yao, Xu Han, Ruobing Xie,Zhiyuan Liu, Fen Lin, Leyu Lin, and Maosong Sun.2019. Open relation extraction: Relational knowl-edge transfer from supervised data to unsuperviseddata. In Proceedings of EMNLP-IJCNLP, pages219–228. ACL.

Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2016.Representation learning of knowledge graphs withhierarchical types. In Proceedings of IJCAI, pages2965–2971. IJCAI/AAAI Press.

Limin Yao, Aria Haghighi, Sebastian Riedel, and An-drew McCallum. 2011. Structured relation discov-ery using generative models. In Proceedings ofEMNLP, pages 1456–1466. ACL.

Limin Yao, Sebastian Riedel, and Andrew McCallum.2012. Unsupervised relation discovery with sensedisambiguation. In Proceedings of ACL, pages 712–720. ACL.

Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou,and Jun Zhao. 2014. Relation classification via con-volutional deep neural network. In Proceedings ofCOLING, pages 2335–2344. ACL.

Ningyu Zhang, Shumin Deng, Zhanlin Sun, Guany-ing Wang, Xi Chen, Wei Zhang, and Huajun Chen.2019. Long-tail relation extraction via knowledgegraph embeddings and graph convolution networks.In Proceedings of NAACL: HLT, pages 3016–3025.ACL.

https://dl.acm.org/doi/pdf/10.1145/502512.502559?download=true

https://dl.acm.org/doi/pdf/10.1145/502512.502559?download=true

https://www.aclweb.org/anthology/Q16-1017



https://openreview.net/forum?id=r1X3g2_xl

https://openreview.net/forum?id=r1X3g2_xl



https://www.jair.org/index.php/jair/article/view/10513

https://www.jair.org/index.php/jair/article/view/10513




https://icml.cc/Conferences/2005/proceedings/papers/094_Hierarchical_RousuEtAl.pdf

https://icml.cc/Conferences/2005/proceedings/papers/094_Hierarchical_RousuEtAl.pdf

https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_089.pdf

https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_089.pdf




http://jmlr.org/papers/v15/srivastava14a.html

http://jmlr.org/papers/v15/srivastava14a.html

https://www.aclweb.org/anthology/N18-1081

https://www.aclweb.org/anthology/N18-1081

https://doi.org/10.1126/science.1192788

https://doi.org/10.1126/science.1192788

https://doi.org/10.1145/2629489

https://doi.org/10.1145/2629489

https://papers.nips.cc/paper/3597-large-margin-taxonomy-embedding-for-document-categorization

https://papers.nips.cc/paper/3597-large-margin-taxonomy-embedding-for-document-categorization




https://www.ijcai.org/Proceedings/16/Papers/421.pdf

https://www.ijcai.org/Proceedings/16/Papers/421.pdf





https://www.aclweb.org/anthology/C14-1220/

https://www.aclweb.org/anthology/C14-1220/

https://doi.org/10.18653/v1/N19-1306

https://doi.org/10.18653/v1/N19-1306

5692

A Implementation Details

In this section, we introduce hyperparameters andimportant bounds in relation representation learn-ing and in relation hierarchy expansion respectively.All hyperparameters are selected by grid search onthe development set. Moreover, we report the aver-age training time and the number of parameters.

Representation Learning Hyperparameters. Inembedding layer, we use 50-d GloVe (Penning-ton et al., 2014) word embeddings and 2 ran-domly initialized 5-d position embeddings, andall the embeddings are trainable. The con-volution kernel size is 3, relation embeddingsize is 64 selected from {64, 128, 256, 512}, andλd in representation learning is 0.7 selectedfrom {0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}. We applydropout (Srivastava et al., 2014) after embeddinglayer with dropout rate 0.2, and L2 regularizationon the convolutional and fully connected layer withhyperparameters 2e-4 and 1e-3 respectively. Dur-ing training, the batch size is 64 selected from{16, 32, 64, 128}. For each batch, we randomlysample 4 relation types, each with 16 instances.Besides, hierarchical curriculum learning strategylasts 100 batches in the first epoch to warm up themodel parameters. In pair-wise virtual adversarialtraining strategy, we first generate perturbation vec-tor δ1 for each instance representation v, where thevalue in each dimension follows a uniform distribu-tion of range [0, 1). Then the perturbation vector δ1is scaled such that its L2 norm is 0.02. We add δ1to the instance feature, and compute the worst-caseperturbation δ2 based on the gradient. Finally δ2 isscaled to 0.02 in L2 norm, and added to the featureof the instance to obtain v.

Hierarchy Expansion Hyperparameters. In re-lation clustering process, Louvain (Blondel et al.,2008) algorithm will not take the similarity be-tween instances less than the threshold 0.5 intoaccount. The instance of novel relation prototypeshaving less than 5 instances will be moved to theirclosest neighbors based on the average connectionweight. During hierarchy expansion, the thresholdsfor singleton relations in top-down expansion andRW-HAC-HE are 0.2 and 0.1 respectively.

B Evaluation Metrics

In this section, we provide details of evaluationmetrics in two settings.

Relation Clustering Setting. Following previous

works (Wu et al., 2019; Hu et al., 2020), we adoptinstance-level evaluation metrics, including B3, V-measure and Adjusted Rand Index.

(1) B3. For each instance in test set, B3 com-putes its precision and recall by comparing the clus-ter containing the instance in prediction results andthe cluster containing the instance in golden answer.After that, B3 averages the precision and recallof each instance and produces a harmonic mean.(2) V-measure. Similarly, V-measure (Rosenbergand Hirschberg, 2007) is another instance-basedmeasurement that further introduces conditionalentropy, which asks for the higher requirement ofthe purity of clusters. Compare to B3, the exis-tence of a few wrong instances in a relatively purecluster decreases more score to punish clusteringresults. Meanwhile, the V-measure F1 calculatesthe harmonic mean of homogeneity and complete-ness. (3) Adjusted Rand Index. ARI (Hubert andArabie, 1985) counts all pair-wise assignments inthe same or different groups to measure the simi-larity of predicted and golden clusterings. Randomnode assignment makes ARI be 0, and the maxi-mum of ARI is 1, which means the perfect result.Compared to the previous two metrics, ARI is lesssensitive since it won’t be influenced by an extremesub-value like precision or homogeneity.

Hierarchy Expansion Setting. To bridge the pre-dicted clusters with real relations, we first matcheach predicted cluster to the golden cluster thencast it as a prototype for hierarchy position evalua-tion. We borrow two metrics to evaluate how wella predicted cluster matches the golden cluster, andhow well the predicted cluster links to the goldenposition in hierarchy on cluster-level.

(1) Matching Metric. Similar to Larsen andAone (1999), we try to match each predicted clus-ter to one golden relation with whom the predictedcluster has the highest F1 score on cluster-level.Note that different from the original measurement,the golden relation can be matched once only. Foreach paired novel cluster and golden relation, wecalculate precision, recall, and F1 score, and finallyweighted sum up based on the number of instances.

(2) Taxonomy Metric. Taxonomy metricwas first proposed to evaluate taxonomy struc-ture (Dellschaft and Staab, 2006). After match-ing predicted clusters to golden relations, for eachpredicted cluster, we use taxonomy metric to com-pare the position of this predicted cluster and theposition of the corresponding golden relation in

5693

hierarchy. Assume position p in hierarchy is char-acterized by the union of all its ancestors and de-scendants u(p). Denote rg as the golden positionand rp as the predicted position of relation r in thehierarchy, respectively. The precision is defined asfollows,

Prec. =1

|P |∑r∈P

|u(rp) ∩ u(rg)||u(rp)|

, (8)

where P are the predicted relation clusters. Aftersymmetrically calculating taxonomy recall, we canget taxonomy F1 by their harmonic average.

(3) Overall Evaluation Metric. To give a globalevaluation of open hierarchical relation extractionproblem, we propose the Overall Evaluation Met-ric. It simply combines the matching metric andtaxonomy metric by arithmetic mean and harmonicmean, to give an overall score that considers bothcluster-level performance and taxonomy-level per-formance.

Date post:	22-Aug-2021
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Open Hierarchical Relation Extraction · 2021. 5. 21. · 2 = 1. The relation encoder can be...

Documents