+ All Categories
Home > Documents > arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

Date post: 03-Jan-2022
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
9
Knowledge Graph Augmented Political Perspective Detection in News Media Shangbin Feng 1 , Minnan Luo 1 , Zilong Chen 1 , Qingyao Li 1 , Xiaojun Chang 2 , Qinghua Zheng 1 1 Xi’an Jiaotong Univeristy, Xi’an, China; 2 RMIT University, Melbourne, Australia Abstract Identifying political perspective in news media has become an important task due to the rapid growth of political com- mentary and the increasingly polarized political ideologies. Previous approaches tend to focus on leveraging textual in- formation and leave out the rich social and political context that helps individuals identify stances. To address this limita- tion, in this paper, we propose a perspective detection method that incorporates external knowledge of real-world politics. Specifically, we construct a political knowledge graph with 1,071 entities and 10,703 triples. We then construct hetero- geneous information networks to represent news documents, which jointly model news text and external knowledge. Fi- nally, we apply gated relational graph convolutional net- works and conduct political perspective detection as graph- level classification. Extensive experiments demonstrate that our method consistently achieves the best performance and outperforms state-of-the-art methods by at least 5.49%. Ab- lation studies further bear out the necessity of external knowl- edge and the effectiveness of our graph-based approach. Introduction The past decade has witnessed dramatic changes of polit- ical commentary in two major ways (Wilson, Parker, and Feinberg 2020; Hong and Kim 2016; Anspach and Carlson 2020). Firstly, gone are the days that journalists and colum- nists in prestigious media outlets are the only source of news articles. The popularity of social media has provided viable venues for everyday individuals to express their political opinions freely, which leads to the dramatic increase in the volume of political commentary. Secondly, ever-increasing polarization of political ideologies has made it hard for news articles to remain impartial in nature. Detecting political per- spectives in news media would help alleviate the issue of “echo chamber”, where only a single viewpoint is reiterated and further deepens the divide. That being said, political per- spective detection has become a pressing task which calls for further research efforts. Previous research proposals on news bias detection have focused on analyzing the textual content of news articles. Natural language encoders (Kiros et al. 2015; Yang et al. 2016) and pre-trained language models (Devlin et al. 2018) Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Figure 1: Example of news piece and social and political entities that require external knowledge to understand, e.g., their home state, party affiliation and voting records. are adopted by (Li and Goldwasser 2021) to analyze news content to conduct stance detection. (Jiang et al. 2019) uses convolutional neural networks with Glove word embed- dings (Pennington, Socher, and Manning 2014) for political perspective detection and achieves the best result in the hy- perpartisan news detection task in SemEval 2019 (Kiesel et al. 2019). (Li and Goldwasser 2021) leverages the attention mechanism and entity mentions in the news document and achieves state-of-the-art results. However, these text-centric methods fail to incorporate pervasive external knowledge in news articles, while human readers rely on these external knowledge as background in- formation to facilitate commonsense reasoning and perspec- tive detection. For example, Figure 1 presents an article that contains real-world entities such as political organizations, elected officials and geographical locations. External knowl- edge of these entities inform the reader that they are conser- vative republicans, which help to identify the liberal stance expressed in criticizing them. This reasoning demonstrates that external knowledge provides background information and serves as essential indicators of author perspectives. That being said, robust perspective detectors should leverage external knowledge about entities discussed in news articles to boost task performance. In light of the necessity of leveraging external knowledge, arXiv:2108.03861v2 [cs.CL] 7 Sep 2021
Transcript
Page 1: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

Knowledge Graph Augmented Political Perspective Detection in News Media

Shangbin Feng1, Minnan Luo1, Zilong Chen1, Qingyao Li1, Xiaojun Chang2, Qinghua Zheng1

1Xi’an Jiaotong Univeristy, Xi’an, China; 2RMIT University, Melbourne, Australia

Abstract

Identifying political perspective in news media has becomean important task due to the rapid growth of political com-mentary and the increasingly polarized political ideologies.Previous approaches tend to focus on leveraging textual in-formation and leave out the rich social and political contextthat helps individuals identify stances. To address this limita-tion, in this paper, we propose a perspective detection methodthat incorporates external knowledge of real-world politics.Specifically, we construct a political knowledge graph with1,071 entities and 10,703 triples. We then construct hetero-geneous information networks to represent news documents,which jointly model news text and external knowledge. Fi-nally, we apply gated relational graph convolutional net-works and conduct political perspective detection as graph-level classification. Extensive experiments demonstrate thatour method consistently achieves the best performance andoutperforms state-of-the-art methods by at least 5.49%. Ab-lation studies further bear out the necessity of external knowl-edge and the effectiveness of our graph-based approach.

IntroductionThe past decade has witnessed dramatic changes of polit-ical commentary in two major ways (Wilson, Parker, andFeinberg 2020; Hong and Kim 2016; Anspach and Carlson2020). Firstly, gone are the days that journalists and colum-nists in prestigious media outlets are the only source of newsarticles. The popularity of social media has provided viablevenues for everyday individuals to express their politicalopinions freely, which leads to the dramatic increase in thevolume of political commentary. Secondly, ever-increasingpolarization of political ideologies has made it hard for newsarticles to remain impartial in nature. Detecting political per-spectives in news media would help alleviate the issue of“echo chamber”, where only a single viewpoint is reiteratedand further deepens the divide. That being said, political per-spective detection has become a pressing task which calls forfurther research efforts.

Previous research proposals on news bias detection havefocused on analyzing the textual content of news articles.Natural language encoders (Kiros et al. 2015; Yang et al.2016) and pre-trained language models (Devlin et al. 2018)

Copyright © 2021, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

Figure 1: Example of news piece and social and politicalentities that require external knowledge to understand, e.g.,their home state, party affiliation and voting records.

are adopted by (Li and Goldwasser 2021) to analyze newscontent to conduct stance detection. (Jiang et al. 2019)uses convolutional neural networks with Glove word embed-dings (Pennington, Socher, and Manning 2014) for politicalperspective detection and achieves the best result in the hy-perpartisan news detection task in SemEval 2019 (Kiesel etal. 2019). (Li and Goldwasser 2021) leverages the attentionmechanism and entity mentions in the news document andachieves state-of-the-art results.

However, these text-centric methods fail to incorporatepervasive external knowledge in news articles, while humanreaders rely on these external knowledge as background in-formation to facilitate commonsense reasoning and perspec-tive detection. For example, Figure 1 presents an article thatcontains real-world entities such as political organizations,elected officials and geographical locations. External knowl-edge of these entities inform the reader that they are conser-vative republicans, which help to identify the liberal stanceexpressed in criticizing them. This reasoning demonstratesthat external knowledge provides background informationand serves as essential indicators of author perspectives.That being said, robust perspective detectors should leverageexternal knowledge about entities discussed in news articlesto boost task performance.

In light of the necessity of leveraging external knowledge,

arX

iv:2

108.

0386

1v2

[cs

.CL

] 7

Sep

202

1

Page 2: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

we propose a political perspective detection method that in-corporates the rich social and political context to boost taskperformance. We firstly construct a knowledge graph of con-temporary politics to represent the external knowledge thatserves as background for political narratives. We then learnrepresentations for entities and relations in the knowledgegraph with knowledge graph embedding techniques. Afterthat, we construct a heterogeneous information network torepresent news documents which include both textual con-tent and mentioned entities in the knowledge graph. Finally,we apply gated relational graph convolutional networks forgraph classification and conduct political perspective detec-tion. Our main contributions are summarized as follows:

• We construct a knowledge graph of contemporary U.S.politics from Wikipedia pages to serve as external knowl-edge of political perspective detection. This knowledgegraph could also serve as external knowledge for othertasks such as misinformation detection.

• We propose to leverage the rich social and political con-text and model news documents as heterogeneous infor-mation networks for political perspective detection. Ourapproach is end-to-end, inductive and effectively framingthe task of stance detection as graph-level classification toincorporate external knowledge.

• We conduct extensive experiments to evaluate our methodand competitive baselines. As a result, our method out-performs all state-of-the-art approaches. Further ablationstudies bear out the necessity of external knowledge andthe effectiveness of constructing a heterogeneous infor-mation network for political perspective detection.

Related WorkIn this section, we briefly review related literature on politi-cal perspective detection and knowledge graph applications.

Political Perspective DetectionThe problem of political perspective detection is generallystudied in two different settings: social media and news me-dia. For stance detection in social media, many works havestudied the problem as text classification to identify stancein specific posts based on their textual content. They usedtechniques such as sentiment analysis (Jiang et al. 2011;Wang et al. 2017), n-gram language models (Mohammadet al. 2016) and different neural network architectures (Au-genstein et al. 2016; Du et al. 2017; Xu et al. 2018). Otherapproaches explored the problem of identifying stances ofsocial media users instead of individual posts. (Darwish etal. 2020) conducted user clustering to identify their stances.(Magdy et al. 2016) tried to predict user perspectives on ma-jor events based on network dynamics and user interactions.(Stefanov et al. 2020) adopted label propagation and pro-posed a semi-supervised approach.

For stance detection in news media, a supervised taskis often formed to classify a news document into severalperspective labels. Previous works have proposed to lever-age semantic information in news articles for perspectivedetection, such as linguistic indicators (Field et al. 2018),

Figure 2: Partial visualization of our constructed politicalknowledge graph to serve as external knowledge for politicalperspective detection in news media.

bias features (Horne, Khedr, and Adali 2018) and deep lan-guage encoders (Li and Goldwasser 2021; Yang et al. 2016;Jiang et al. 2019). Later approaches have tried to enrich thetextural content of news with graph structures of online com-munities that interact with different news outlets. (Pan etal. 2016) studied the problem of learning text representa-tion with the help of social information. (Li and Goldwasser2019) supplemented news articles with a network of Twitterusers and their interaction. In this paper, we focus on polit-ical perspective detection in news media and build on theseworks while also incorporating external knowledge of con-temporary politics.

Knowledge Graph ApplicationsKnowlegde graphs that represent diversified relations be-tween real-world entities are increasingly popular in AI re-search. Many large-scale knowledge graphs have been con-structed and a wide range of tasks and solutions are pro-posed, such as knowledge graph completion (Bordes etal. 2013; Sun et al. 2019) and modeling temporal knowl-edge graphs (Ma, Tresp, and Daxberger 2019; Liu et al.2020b). Apart from being an effective data structure to rep-resent real-world knowledge, knowledge graphs also serveas domain knowledge for a wide range of tasks, such aslanguage representation learning (Liu et al. 2020a; He etal. 2019), question answering (Lin et al. 2019; Ding etal. 2019) and recommender systems (Wang et al. 2019;Xian et al. 2019). In this paper, we build on these worksand leverage external knowledge of political perspective de-tection in the form of knowledge graphs.

MethodologyOverviewWe propose a political perspective detection method thatconstructs heterogeneous information networks to representnews articles and frames the task as graph-level classifica-tion. Specifically, we firstly construct a knowledge graph of

Page 3: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

Entity Type Example Relation Type Example

elected office the U.S. Senate affiliated to (Joe Biden, affiliated to, Democratic Party)time period 117th congress from (Ted Cruz, from, Texas)president Joe Biden appoint (Donald Trump, appoint, Amy Coney Barrett)

supreme court justice Amy Coney Barrett overlap with (Joe Biden, overlap with, 117th congress)senator Elizabeth Warren member of (Dianne Feinstein, member of, the U.S. Senate)

congressperson Nancy Pelosi strongly favor (Bernie Sanders, strongly favor, liberal values)governor Ron DeSantis favor (Joe Cunningham, favor, liberal values)

state Massachusetts neutral (Henry Cuellar, neutral, liberal values)political party Republican Party oppose (Lamar Alexander, oppose, liberal values)

political ideology1 liberal values strongly oppose (Ted Cruz, strongly oppose, liberal values)

Table 1: List of entities and relations in our collected political knowledge graph.

contemporary U.S. politics to serve as external knowledgefor perspective detection. We then transform news articlesinto heterogeneous information networks, which models dif-ferent semantic granularity and incorporates external knowl-edge. Finally, we adopt gated relational graph convolutionalnetworks to learn representations and conduct political per-spective detection.

Knowledge Graph ConstructionWe firstly build a knowledge graph of contemporary U.S.politics to serve as external knowledge for our political per-spective detection model. We select active and importantentities in the social and political context such as politicalideologies, elected officials and political organizations. Wethen retrieve information from Wikipedia to determine re-lations between these entities and construct the knowledgegraph with derived triples. A complete list of different nodesand relations are listed in Table 1. As a result, we con-struct a political knowledge graph with 1,071 entities and10,703 triples. We visualize part of our constructed knowl-edge graph and present it in Figure 2.

News Graph ConstructionWith the advent of graph neural networks (GNNs), manyworks proposed to construct graphs to represent text andapply GNNs for tasks in natural language processing, suchas text classification (Yao, Mao, and Luo 2019; Zhang etal. 2020) and question answering (De Cao, Aziz, and Titov2018; Tang et al. 2020). A central aim of this paper is to in-corporate external knowledge in perspective detection, andwe inject knowledge into heterogeneous information net-works (HINs) to represent the diversified interactions be-tween news text and real-world entities in the knowledgegraph. Specifically, we use one document node, s paragraphnodes and n entity nodes to represent the entire document,s paragraphs and n entities in the knowledge graph respec-tively. After defining the nodes in the HIN, we connect themwith three types of edges:

• doc-para edge: document node is connected with each

Figure 3: Overview of our proposed heterogeneous informa-tion network structure to represent each news article.

paragraph node by doc-para edges. This ensures that ev-ery paragraph plays a role in analyzing the document.

• para-para edge: each paragraph node is connected withparagraphs that precede and follow it. These edges pre-serve the sequential flow of text in the news document.

• para-ent edge: each paragraph node is connected withits mentioned entities. We conduct coreference resolu-tion (Lee, He, and Zettlemoyer 2018) to capture all en-tity mentions in a news document. These edges aim toincorporate external knowledge of the knowledge graphfor political perspective detection.

We denote these three types of edges as O = {doc −para, para−para, para−ent}. We illustrate our proposedHIN structure to represent news articles in Figure 3.

After obtaining the news graphs, we obtain representationof document and paragraph nodes by encoding news titleand text in the i-th paragraph with the pre-trained languagemodel RoBERTa (Liu et al. 2019) and denote them as vdand vpi . We then obtain representation of entity nodes by en-coding the political knowledge graph with TransE (Bordeset al. 2013), a representation learning framework for knowl-edge graphs, and denote representation of the i-th entity asvei . We transform the representation vectors vd, vpi and veiwith fully connected layers to obtain the initial features for

Page 4: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

Hyperparameter Value

size of RoBERTa representation 768size of TransE representation 200

size of gated R-GCN hidden state 512optimizer Adam

learning rate 10−3

batch size 16dropout 0.5

maximum epochs 50L2-regularization λ 10−5

gated R-GCN layer count L 2

Table 2: Implementation details and hyperparameter settingsof our political perspective detection model.

these nodes on the HIN. Specifically, for document node andparagraph nodes:

x(0)0 = φ(WS · vd + bS), x

(0)i = φ(WS · vpi + bS) (1)

where WS and bS are learnable parameters for textual in-puts and φ is any nonlinear function. We adopt Leaky-ReLUfor φ without further notice. Similarly, we use another fullyconnected layer to obtain initial features for entity nodes:

x(0)j = φ(WE · vei + bE) (2)

where WE and bE are learnable parameters.

Learning and OptimizationOur model uses gated relational graph convolutional net-works (gated R-GCNs) to learn representations for the HINand conduct political perspective detection on news articles.For the l-th layer of gated R-GCN, we firstly aggregate mes-sages from neighbors as follows:

u(l)i = Θs · x(l−1)i +

∑r∈O

∑j∈Nr(i)

1

|Nr(i)|Θr · x(l−1)j (3)

where u(l)i is the hidden representation for the i-th node inthe l-th layer, Nr(i) is node i’s neighborhood of relation r,Θs and Θr are learnable parameters. We then calculate gatelevels:

a(l)i = σ(WA · [u(l)i , x

(l−1)i ] + bA) (4)

where σ(·) is the sigmoid function, [·, ·] denotes the concate-nation operation, WA and bA are learnable parameters. Wethen apply the gate to u(l)i and x(l−1)i :

x(l)i = tanh(u

(l)i )� a(l)i + x

(l−1)i � (1− a(l)i ) (5)

1Political ideologies and individuals’ stances towardsthem are adapted from aflcio.org/scorecard and heritageac-tion.com/scorecard

Item Value Item Value

# articles 645 avg. # word 582.99# True 238 avg. # paragraph 16.16# False 407 avg. # mentioned entity 20.75

year 2019 avg. # entity mentions 42.70

Table 3: Statistics of the SemEval dataset.

where x(l)i is the output of the l-th gated R-GCN layer and� denotes the Hadamard product operation.

After applying a total ofL gated R-GCN layers, we obtainthe learnt node representations x(L). We use two methods toobtain representation for the entire graph vg:• document only, where we use the representation of the

document node to represent the entire graph:

vg = x(L)0 (6)

• average paragraphs, where we average representationsof all paragraph nodes to represent the entire graph:

vg =1

s

s∑i=1

x(L)i (7)

We then transform vg with a softmax layer to conduct po-litical perspective detection:

y = softmax(WO · vg + bO) (8)

where y is our model’s prediction, WO and bO are learnableparameters.

The loss function of our method is as follows:

L = −∑D

Y∑i=1

yilog(yi) + λ∑w∈θ

w2 (9)

where Y is the number of stance labels, y denotes stanceannotation, D denotes the data set, θ are all learnable pa-rameters in our proposed model and λ is a hyperparameter.

ExperimentsIn this section, we compare our proposed solution with state-of-the-art methods on real-world political perspective detec-tion datasets. We also study the effect of incorporating ex-ternal knowledge, analyze the graph learning techniques inour method and present specific cases to better understandthe mechanism and effectiveness of our proposed approach.

DatasetWhile existing works in political perspective detection haveprovided many annotated datasets, a vast majority of themfocuses on detecting stances on social media rather than innews media. We make use of the news-based dataset Se-mEval, which is the training dataset from the SemEval 2019Task 4: Hyperpartisan News Detection (Kiesel et al. 2019).The task aims to detect hyperpartisan perspectives in newsarticles which are manually annotated with binary labels.

Page 5: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

Model Entity Knowledge Accuracy

CNN Glove 0.7963CNN ELMo 0.8404

HLSTM Glove 0.8158HLSTM ELMo 0.8328HLSTM Embed 0.8171HLSTM Output 0.8125

BERT 0.8341MEAN ENT X 0.8451MEAN REL X 0.8309

MEAN Ensemble X 0.8522OURS DO X X 0.8617OURS AP X X 0.9071

Table 4: Performance of our method and competitive base-lines on the SemEval dataset. Entity and Knowledge indicatewhether these models leverage entity mentions and externalknowledge entailed in news articles.

The statistics of the dataset is presented in Table 3. We fol-low the same 10-fold cross validation as in (Jiang et al.2019) and (Li and Goldwasser 2021) so that our results aredirectly comparable with previous works.

BaselinesWe compare our method with competitve baselines, whichinclude both outstanding solutions in the SemEval 2019 con-test and state-of-the-art proposals that came after the contest.• CNN is the first place solution from the SemEval 2019

Task 4 contest (Kiesel et al. 2019). It uses Glove (Pen-nington, Socher, and Manning 2014) (CNN Glove) andELMo (Peters et al. 2018) (CNN ELMo) word embed-dings as well as convolutional layers for stance predic-tion. (Jiang et al. 2019)

• HLSTM stands for hierarchical long short-term memorynetworks (Yang et al. 2016). It encodes news with word-level and sentence-level LSTM and aggregate them withself-attention. (Li and Goldwasser 2019)

• HLSTM Embed and HLSTM Output useWikipedia2Vec (Yamada et al. 2018) and BERT-inspired masked entity models to learn entity repre-sentations and concatenate them with word embed-dings (HLSTM Embed) or document representation(HLSTM Output). (Li and Goldwasser 2021)

• BERT is a language model based on deep bidirectionaltransformers. Pre-trained BERT is fine-tuned on the taskof perspective detection. (Devlin et al. 2018)

• MEAN is a perspective detection method that injects en-tities and relations into LSTMs with multi-head attention.MEAN could leverage entities (MEAN ENT), relations(MEAN REL) or both (MEAN Ensemble). (Li and Gold-wasser 2021)

Figure 4: Count of entity mentions in news article in the Se-mEval dataset with regard to the count of mentioned entities,as well as whether our method correctly identifies their po-litical perspectives.

ImplementationWe use pytorch (Paszke et al. 2019), pytorch lightning (Fal-con 2019), torch geometric (Fey and Lenssen 2019) and thetransformers library (Wolf et al. 2020) for an efficient im-plementation of our proposed political perspective detectionmodel. We present our hyperparameter settings in Table 2 tofacilitate reproduction. Our implementation is trained on aTitan X GPU with 12GB memory. We submit our collectedpolitical knowledge graph and all implemented codes as sup-plementary material to facilitate reproduction.

Experiment ResultsThe performance of our method and competitive baselineson the SemEval dataset are presented in Table 4. Ours DOand Ours AP represent our methods in the document onlyor average paragraphs settings respectively. Table 4 demon-strates that:

• Our model, especially Ours AP, consistently outper-forms all competitive baselines, including the state-of-the-art method MEAN (Li and Goldwasser 2021).

• Two methods that leverage entity mentions, MEAN andours, outperform other baselines. Our method furtherincorporates external knowledge in political knowledgegraphs and outperforms MEAN, thus the strategy of lever-aging entity mentions and external knowledge is proved tobe effective.

• For methods that use word embeddings, ELMo (Peterset al. 2018) often outperforms their Glove (Pennington,Socher, and Manning 2014) counterparts. This suggeststhat text analysis is essential in political perspective de-tection, where we adopt RoBERTa (Liu et al. 2019) as analternative to ELMo and Glove.

These observations substantiate the general effectivenessof our proposed model. We further examine how external

Page 6: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

Figure 5: Our method’s perspective detection accuracy andstandard deviation with regard to the percentage of externalknowledge included.

knowledge and graph learning contribute to our model’s out-standing performance. We use OURS AP for these experi-ments without further notice.

External Knowledge StudyOur proposed model focuses on incorporating externalknowledge of social and political context in the task of polit-ical perspective detection. We construct a knowledge graphof contemporary U.S. politics, learn representations for enti-ties and relations and inject them into a heterogeneous infor-mation network to represent news documents. To examinewhether real-world news articles actually contain entities inthe political knowledge graph, we analyze news articles inthe SemEval dataset and present our findings in Figure 4.Figure 4 illustrates that most news articles contain a consid-erable amount of entities in the political knowledge graph.In addition, news articles with the largest amount of entitymentions are generally correctly classified by our model,suggesting that the inclusion of pertinent external knowl-edge boost task performance.

To further examine the effect of external knowledge, wegradually reduce the inclusion of external knowledge by ran-domly removing para-ent edges in the HINs and presentmodel performance in Figure 5. It is demonstrated thatour model reaches the best performance with full externalknowledge and detection accuracy slightly decreases withreduced knowledge. Performance drops sharply when all ex-ternal knowledge and entities are removed from the HIN,which suggests that as little as 10% of external knowledgewould help our model better understand the context of thenews article and lead to significantly improved task perfor-mance.

As a result, these studies demonstrate that incorporatingexternal knowledge is essential in the task of political per-spective detection and our proposed approach of leveragingpolitical knowledge graphs with heterogeneous information

Figure 6: Ablation study removing doc-para (edge 1), para-para (edge 2) and para-ent (edge 3) edges and removing thegraph structure altogether.

Ablation Settings Accuracy

full initial features 0.9071 ± 0.0390no RoBERTa features 0.6341 ± 0.0371

no TransE features 0.8776 ± 0.0405no RoBERTa and TransE features 0.6326 ± 0.0372

Table 5: Ablation study substituting RoBERTa and TransEnode features in our news HIN with random variables.

networks is generally effective.

Graph Learning StudyTo the best of our knowledge, the proposed model is thefirst to frame the task of political perspective detection asgraph-level classification. Specifically, we construct hetero-geneous information networks that contain both textual con-tents of the news article and external knowledge of men-tioned entities. To examine whether our proposed graphlearning schema is essential in the model’s outstanding per-formance, we conduct a series of ablation study regardingthe HIN graph structure, node initial features and graph op-erators we adopted.

Graph Structure We propose to construct HINs to repre-sent news articles, which are three types of nodes connectedwith three types of edges. In order to examine whether theheterogeneous graph structure is effective, we remove threetypes of edges from the news graph and present our model’sperformance in Figure 6. It is illustrated that removing anytype of edge in the HIN would result in a considerable per-formance drop, which suggests that our proposed HIN struc-ture well captures the semantic content of news articles aswell as diversified interactions between text and entities inthe knowledge graph.

Page 7: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

Bias Gold Ours #Entity #Paragraph Title of the News Articlelean right True True 37 26 THEY DON’T CALL IT ‘THE GREAT TWEET OF CHINA’lean right True True 29 26 Hanson: Progressive attacks on Trump are backfiringlean left True True 48 38 When the Parades Are Over, Who Stands With Unions?lean left True False 13 24 The G.O.P.’s Radical Supreme Court Talkcenter False True 5 6 Why California is closer to becoming a sanctuary statecenter False False 26 33 UK urged to follow Donald Trump’s vow to ’deport illegal immigrants’

Table 6: Example of political perspective detection by our proposed method.

Node Features Upon obtaining the news graphs, we adoptRoBERTa (Liu et al. 2019) and TransE (Bordes et al. 2013)to encode news titles, paragraph text and entities in theknowledge graph. We then use these learnt representationsas initial features for three types of nodes in the news graph.To examine whether these initial features play an importantrole in our model’s performance, we substitute these ini-tial features with random variables. Detection accuracy andstandard deviation under these settings are presented in Ta-ble 5. It is demonstrated that RoBERTa features of textualcontent is a dominant factor in the model’s performance,which suggests that the task of political perspective detec-tion still relies heavily on text modeling and analysis. Apartfrom that, TransE features are also beneficial in that theycontribute to a 3.05% increase in model accuracy.

Graph Operators After obtaining the heterogeneousnews graphs, we adopt gated R-GCNs to propagate nodemessages and learn representations for nodes and graphs.Besides gated R-GCN, there are many graph operators onboth homogeneous (Kipf and Welling 2016; Velickovic etal. 2017; Hamilton, Ying, and Leskovec 2017) and heteroge-neous graphs (Schlichtkrull et al. 2018). To validate the ef-fectiveness of the graph heterogeneity and our adopted gatedR-GCN operator, we substitute gated R-GCN layers in ourframework with other graph operators and present their per-formances in Figure 7. It is illustrated that vanilla R-GCNoutperforms its homogeneous counterpart GCN, which in-dicates that leveraging heterogeneity in news graphs con-tribute to the model’s performance. Apart from that, ouradopted gated R-GCN performs best among all graph oper-ators, validating our model’s design choices regarding graphneural networks.

To sum up, the structure of heterogeneous news graphs,node initial features and the gated R-GCN operator adoptedin our model are effective and essential in the method’s out-standing performance.

Case StudyTo further understand the inclusion of external knowledgeand how our method detects political perspective in newsmedia, we select six representative news articles in theSemEval dataset and analyze their mentioned entities andmodel predictions. We present these cases and prediction re-sults in Table 6. These examples demonstrate the correlationbetween ample external knowledge and improved stance de-

Figure 7: Ablation study substituting our adopted gated R-GCN layers with other graph neural networks.

tection performance, since news articles with more men-tioned entities are generally correctly classified. This ob-servation suggests that more external knowledge should beleveraged to identify subtle biases in news articles and boosttask performance.

Conclusion and Future WorkIn this paper, we propose an end-to-end, inductive andgraph-based approach to political perspective detection thateffectively leverages external knowledge. We collect aknowledge graph of contemporary U.S. politics to serve asexternal knowledge for political perspective detection. Wethen propose to incorporate external knowledge of social andpolitical context, construct a heterogeneous information net-work to represent news articles and frame the task of stancedetection as graph classification. We conduct extensive ex-periments, which demonstrate that our method consistentlyoutperforms state-of-the-art baselines. Further ablation stud-ies show that external knowledge and graph learning are es-sential in our proposal’s outstanding performance.

In the future, we plan to extend the ideas of representingdocuments with heterogeneous information networks andleveraging domain knowledge to tasks such as fake news de-tection and text classification in general.

Page 8: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

References[Anspach and Carlson 2020] Anspach, N. M., and Carlson,T. N. 2020. What to believe? social media commentary andbelief in misinformation. Political Behavior 42(3):697–718.

[Augenstein et al. 2016] Augenstein, I.; Rocktaschel, T.;Vlachos, A.; and Bontcheva, K. 2016. Stance detec-tion with bidirectional conditional encoding. arXiv preprintarXiv:1606.05464.

[Bordes et al. 2013] Bordes, A.; Usunier, N.; Garcia-Duran,A.; Weston, J.; and Yakhnenko, O. 2013. Translating em-beddings for modeling multi-relational data. Advances inneural information processing systems 26.

[Darwish et al. 2020] Darwish, K.; Stefanov, P.; Aupetit, M.;and Nakov, P. 2020. Unsupervised user stance detection ontwitter. In Proceedings of the International AAAI Confer-ence on Web and Social Media, volume 14, 141–152.

[De Cao, Aziz, and Titov 2018] De Cao, N.; Aziz, W.; andTitov, I. 2018. Question answering by reasoning across doc-uments with graph convolutional networks. arXiv preprintarXiv:1808.09920.

[Devlin et al. 2018] Devlin, J.; Chang, M.-W.; Lee, K.; andToutanova, K. 2018. Bert: Pre-training of deep bidirectionaltransformers for language understanding. arXiv preprintarXiv:1810.04805.

[Ding et al. 2019] Ding, M.; Zhou, C.; Chen, Q.; Yang, H.;and Tang, J. 2019. Cognitive graph for multi-hop readingcomprehension at scale. arXiv preprint arXiv:1905.05460.

[Du et al. 2017] Du, J.; Xu, R.; He, Y.; and Gui, L. 2017.Stance classification with target-specific neural attentionnetworks. International Joint Conferences on Artificial In-telligence.

[Falcon 2019] Falcon, WA, e. a. 2019.Pytorch lightning. GitHub. Note:https://github.com/PyTorchLightning/pytorch-lightning3.

[Fey and Lenssen 2019] Fey, M., and Lenssen, J. E. 2019.Fast graph representation learning with pytorch geometric.arXiv preprint arXiv:1903.02428.

[Field et al. 2018] Field, A.; Kliger, D.; Wintner, S.; Pan, J.;Jurafsky, D.; and Tsvetkov, Y. 2018. Framing and agenda-setting in russian news: a computational analysis of intricatepolitical strategies. arXiv preprint arXiv:1808.09386.

[Hamilton, Ying, and Leskovec 2017] Hamilton, W. L.;Ying, R.; and Leskovec, J. 2017. Inductive representationlearning on large graphs. In Proceedings of the 31stInternational Conference on Neural Information ProcessingSystems, 1025–1035.

[He et al. 2019] He, B.; Zhou, D.; Xiao, J.; Liu, Q.; Yuan,N. J.; Xu, T.; et al. 2019. Integrating graph contextualizedknowledge into pre-trained language models. arXiv preprintarXiv:1912.00147.

[Hong and Kim 2016] Hong, S., and Kim, S. H. 2016. Po-litical polarization on twitter: Implications for the use of so-cial media in digital governments. Government InformationQuarterly 33(4):777–782.

[Horne, Khedr, and Adali 2018] Horne, B. D.; Khedr, S.; andAdali, S. 2018. Sampling the news producers: A large newsand feature data set for the study of the complex media land-scape. In Twelfth International AAAI Conference on Weband Social Media.

[Jiang et al. 2011] Jiang, L.; Yu, M.; Zhou, M.; Liu, X.; andZhao, T. 2011. Target-dependent twitter sentiment classi-fication. In Proceedings of the 49th annual meeting of theassociation for computational linguistics: human languagetechnologies, 151–160.

[Jiang et al. 2019] Jiang, Y.; Petrak, J.; Song, X.; Bontcheva,K.; and Maynard, D. 2019. Team bertha von suttner atsemeval-2019 task 4: Hyperpartisan news detection usingelmo sentence representation convolutional network. In Pro-ceedings of the 13th International Workshop on SemanticEvaluation, 840–844.

[Kiesel et al. 2019] Kiesel, J.; Mestre, M.; Shukla, R.; Vin-cent, E.; Adineh, P.; Corney, D.; Stein, B.; and Potthast, M.2019. Semeval-2019 task 4: Hyperpartisan news detection.In Proceedings of the 13th International Workshop on Se-mantic Evaluation, 829–839.

[Kipf and Welling 2016] Kipf, T. N., and Welling, M. 2016.Semi-supervised classification with graph convolutional net-works. arXiv preprint arXiv:1609.02907.

[Kiros et al. 2015] Kiros, R.; Zhu, Y.; Salakhutdinov, R.;Zemel, R.; Urtasun, R.; Torralba, A.; and Fidler, S. 2015.Skip-thought vectors. In NIPS.

[Lee, He, and Zettlemoyer 2018] Lee, K.; He, L.; and Zettle-moyer, L. 2018. Higher-order coreference resolution withcoarse-to-fine inference. In NAACL-HLT.

[Li and Goldwasser 2019] Li, C., and Goldwasser, D. 2019.Encoding social information with graph convolutional net-works forpolitical perspective detection in news media. InProceedings of the 57th Annual Meeting of the Associationfor Computational Linguistics, 2594–2604.

[Li and Goldwasser 2021] Li, C., and Goldwasser, D. 2021.Mean: Multi-head entity aware attention networkfor politi-cal perspective detection in news media. In Proceedings ofthe Fourth Workshop on NLP for Internet Freedom: Censor-ship, Disinformation, and Propaganda, 66–75.

[Lin et al. 2019] Lin, B. Y.; Chen, X.; Chen, J.; and Ren, X.2019. Kagnet: Knowledge-aware graph networks for com-monsense reasoning. arXiv preprint arXiv:1909.02151.

[Liu et al. 2019] Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi,M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; andStoyanov, V. 2019. Roberta: A robustly optimized bert pre-training approach. arXiv preprint arXiv:1907.11692.

[Liu et al. 2020a] Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju,Q.; Deng, H.; and Wang, P. 2020a. K-bert: Enabling lan-guage representation with knowledge graph. In Proceed-ings of the AAAI Conference on Artificial Intelligence, vol-ume 34, 2901–2908.

[Liu et al. 2020b] Liu, Y.; Hua, W.; Xin, K.; and Zhou, X.2020b. Context-aware temporal knowledge graph embed-ding. In International Conference on Web Information Sys-tems Engineering, 583–598. Springer.

Page 9: arXiv:2108.03861v2 [cs.CL] 7 Sep 2021

[Ma, Tresp, and Daxberger 2019] Ma, Y.; Tresp, V.; andDaxberger, E. A. 2019. Embedding models for episodicknowledge graphs. Journal of Web Semantics 59:100490.

[Magdy et al. 2016] Magdy, W.; Darwish, K.; Abokhodair,N.; Rahimi, A.; and Baldwin, T. 2016. # isisisnotislam or#deportallmuslims? predicting unspoken views. In Proceed-ings of the 8th ACM Conference on Web Science, 95–106.

[Mohammad et al. 2016] Mohammad, S.; Kiritchenko, S.;Sobhani, P.; Zhu, X.; and Cherry, C. 2016. A dataset fordetecting stance in tweets. In Proceedings of the Tenth In-ternational Conference on Language Resources and Evalu-ation (LREC’16), 3945–3952.

[Pan et al. 2016] Pan, S.; Wu, J.; Zhu, X.; Zhang, C.; andWang, Y. 2016. Tri-party deep network representation. Net-work 11(9):12.

[Paszke et al. 2019] Paszke, A.; Gross, S.; Massa, F.; Lerer,A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.;Gimelshein, N.; Antiga, L.; et al. 2019. Pytorch: An impera-tive style, high-performance deep learning library. Advancesin neural information processing systems 32:8026–8037.

[Pennington, Socher, and Manning 2014] Pennington, J.;Socher, R.; and Manning, C. D. 2014. Glove: Globalvectors for word representation. In Proceedings of the2014 conference on empirical methods in natural languageprocessing (EMNLP), 1532–1543.

[Peters et al. 2018] Peters, M. E.; Neumann, M.; Iyyer, M.;Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer, L. 2018.Deep contextualized word representations. arXiv preprintarXiv:1802.05365.

[Schlichtkrull et al. 2018] Schlichtkrull, M.; Kipf, T. N.;Bloem, P.; Van Den Berg, R.; Titov, I.; and Welling, M.2018. Modeling relational data with graph convolutionalnetworks. In European semantic web conference, 593–607.Springer.

[Stefanov et al. 2020] Stefanov, P.; Darwish, K.; Atanasov,A.; and Nakov, P. 2020. Predicting the topical stance and po-litical leaning of media using tweets. In Proceedings of the58th Annual Meeting of the Association for ComputationalLinguistics, 527–537.

[Sun et al. 2019] Sun, Z.; Deng, Z.-H.; Nie, J.-Y.; and Tang,J. 2019. Rotate: Knowledge graph embedding byrelational rotation in complex space. arXiv preprintarXiv:1902.10197.

[Tang et al. 2020] Tang, Z.; Shen, Y.; Ma, X.; Xu, W.; Yu,J.; and Lu, W. 2020. Multi-hop reading comprehensionacross documents with path-based graph convolutional net-work. arXiv preprint arXiv:2006.06478.

[Velickovic et al. 2017] Velickovic, P.; Cucurull, G.;Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2017.Graph attention networks. arXiv preprint arXiv:1710.10903.

[Wang et al. 2017] Wang, B.; Liakata, M.; Zubiaga, A.; andProcter, R. 2017. Tdparse: Multi-target-specific sentimentrecognition on twitter. In Proceedings of the 15th Confer-ence of the European Chapter of the Association for Com-putational Linguistics: Volume 1, Long Papers, 483–493.

[Wang et al. 2019] Wang, X.; Wang, D.; Xu, C.; He, X.; Cao,Y.; and Chua, T.-S. 2019. Explainable reasoning overknowledge graphs for recommendation. In Proceedings ofthe AAAI Conference on Artificial Intelligence, volume 33,5329–5336.

[Wilson, Parker, and Feinberg 2020] Wilson, A. E.; Parker,V. A.; and Feinberg, M. 2020. Polarization in the contem-porary political and media landscape. Current Opinion inBehavioral Sciences 34:223–228. Political Ideologies.

[Wolf et al. 2020] Wolf, T.; Debut, L.; Sanh, V.; Chaumond,J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Fun-towicz, M.; Davison, J.; Shleifer, S.; von Platen, P.; Ma, C.;Jernite, Y.; Plu, J.; Xu, C.; Scao, T. L.; Gugger, S.; Drame,M.; Lhoest, Q.; and Rush, A. M. 2020. Transformers: State-of-the-art natural language processing. In Proceedings ofthe 2020 Conference on Empirical Methods in Natural Lan-guage Processing: System Demonstrations, 38–45. Online:Association for Computational Linguistics.

[Xian et al. 2019] Xian, Y.; Fu, Z.; Muthukrishnan, S.;De Melo, G.; and Zhang, Y. 2019. Reinforcement knowl-edge graph reasoning for explainable recommendation. InProceedings of the 42nd international ACM SIGIR confer-ence on research and development in information retrieval,285–294.

[Xu et al. 2018] Xu, C.; Paris, C.; Nepal, S.; and Sparks, R.2018. Cross-target stance classification with self-attentionnetworks. arXiv preprint arXiv:1805.06593.

[Yamada et al. 2018] Yamada, I.; Asai, A.; Sakuma, J.;Shindo, H.; Takeda, H.; Takefuji, Y.; and Matsumoto, Y.2018. Wikipedia2vec: An efficient toolkit for learningand visualizing the embeddings of words and entities fromwikipedia. arXiv preprint arXiv:1812.06280.

[Yang et al. 2016] Yang, Z.; Yang, D.; Dyer, C.; He, X.;Smola, A.; and Hovy, E. 2016. Hierarchical attention net-works for document classification. 1480–1489.

[Yao, Mao, and Luo 2019] Yao, L.; Mao, C.; and Luo, Y.2019. Graph convolutional networks for text classification.In Proceedings of the AAAI conference on artificial intelli-gence, volume 33, 7370–7377.

[Zhang et al. 2020] Zhang, Y.; Yu, X.; Cui, Z.; Wu, S.; Wen,Z.; and Wang, L. 2020. Every document owns its struc-ture: Inductive text classification via graph neural networks.arXiv preprint arXiv:2004.13826.


Recommended