+ All Categories
Home > Data & Analytics > Hybrid sentiment and network analysis of social opinion polarization icoict

Hybrid sentiment and network analysis of social opinion polarization icoict

Date post: 23-Jan-2018
Category:
Upload: andry-alamsyah
View: 133 times
Download: 2 times
Share this document with a friend
13
Hybrid Sentiment and Network Analysis of Social Opinion Polarization Andry Alamsyah and Fidocia Adityawarman School of Economic and Business Telkom University Indonesia The 5 th International Conference on Information and Communication Technology (ICoICT)
Transcript

Hybrid Sentiment and Network Analysis of Social Opinion Polarization

Andry Alamsyah and Fidocia AdityawarmanSchool of Economic and BusinessTelkom UniversityIndonesia

The 5th International Conference on Information and Communication Technology (ICoICT)

Background1. Democratization, ICT Influence, Social Dynamics

2. Mapping from aggregate data, not from individual case study (in the case of opinion polarization, qualitative studies)

3. Data is cheap (available freely)

4. Objective -> The quantification of social opinion polarization

5. Given large scale conversational (unstructured) data -> How to summarize those data

6. Hybrid Structural and Content Analysis approach

7. We have Social Network Analysis (SNA) and Sentiment Analysis

8. As case study : a reclamation issue in Jakarta, Indonesia

FrameworkTwitter Relationship Mining Text Mining

Social Network Analysis

Network Property

Influential Actors

Detection

Sentiment Analysis

Datasets

Train Data Test Data

Pre-process

Vectorization

Classification Model

Naïve Bayes

Validation

Apply Model

Community Detection

Data

Twitter, June 23 – July 23 2016

Raw Data:60.828 tweets

After Pre-processing:23,115 tweets

Latest Tweets7,345 tweets

Keywords and Hashtags

“reklamasi”, “reklamasi jakarta”, “teluk benoa”,“reklamasi teluk benoa”, “reklamasi makassar”, dan “pulauG”., #reklamasi, #telukbenoa, #reklamasijakarta, #reklamasijkt, #reklamasitelukbenoa, #telukjakarta,

#reklamasibali, #reklamasiuntukjakarta, #dukungreklamasi, #tolakreklamasi

Sentiment Analysis

Sentiment Labeling

Sentences Sentiment

ini bkn kemenangan, ini hanya jalan kompromi melindungi yg lbh besar. tetap tolak

reklamasi dan tanggul laut raksasa!Negatif

menurutku, mau di bali kek, jakarta kek. reklamasi ya reklamasi aja. ga ada bedanya.

ngerusak." trus aku di pukpukNegatif

yes, setuju... yang tidak setuju sebenarnya parpol2 yg sedang berebut rente alias upeti hasil

reklamasiPositif

reklamasi pulau harus dilihat positifnya , dtgkan pekerjaan,perekonomian dll dll..menteri

yg ngeyel,ganti saja !Positif

Sentiment Analyis (2)

1.667

1.947

Train DataPositif

Negatif

T. Positive T. Negative Class Precision

F. Positive 1631 58 96,57 %

F. Negative 36 1916 98,16 %

Class Recall 97,84 % 97,06 %

Accuration Confusion Matrix 97,42 %

Positif, 8134

Negatif, 14981

Test Data

Train data values indicate that the classification model canclassify the data very well. Achieve 97.42% accuracy meansthe possibility of a model to classify the test data correctly ishigh. From these values, can also be known that the ability ofthe classification model in measuring levels of predictiveaccuracy in a class (precision) and the success rate of thesystem in rediscovering a data deemed relevant to the class(recall) is high.

The test data consisting of 23,115 tweets andsuccessfully classify 14,981 or 65% of the text data asnegative and 8,134 or 35% of text data as positive.

Social Network Analysis

Edges colored by sentiment targets. Thus, from the number of connections that are targeted at nodes with undefined sentiment, we see that interaction between nodes possibly occurs when:

1) Twitter users respond or expressing (in form of retweet, reply or mention) sentiments in it after news portal nodes producing a tweet.

2) Twitter users spread his/her views to individuals (nodes) that have not been clearly linked sentiments toward reclamation issues.

3) Twitter users expressing their grudges against username or account belonging to public figure that could be in a form of criticism or suggestions.

Network that is built consists of 4,832 nodes and 5,152 edges.

• We build a graph where the red nodes is the actors with a negative sentiment (counter-reclamation), blue nodes are the actors with a positive sentiment (pro-reclamation), and the gray colored nodes are nodes whose sentiment is not defined.

Overall Network

Social Network Analysis (2)

• To have a better understanding in observing

the polarization of opinion, we filter the

graph on nodes with positive and negative

sentiment alone, and ignore the nodes

with unidentified sentiment. Edges colored

by sentiments of the source (blue means

the interaction of positive nodes to nodes

negative, and vice versa). From the graph, it

can be seen there is a conflict of opinion

between the two groups of opposing views

(positive-negative, negative-positive).

Although, it can also be seen that there is a

tendency that nodes interact with other

nodes with the same sentiment (positive-

positive, negative-negative).

Polarization

Community DetectionCalculation result with Louvain Modularity method showed that there are 7 communities in 59.39% of the network. The remaining 40.61% consist of 770 small communities which only contains 2 to 22 nodes or 0.02% - 0.46%.

communities

Communities with sentiments

Community Detection (2)

By taking the largest communities in the network (22.56% of the total size of the network), it is known that the community is composed of nodes that have different sentiments. This means communities that exist in the network are a heterogeneous community.

Influential Actors

No. UsernameEigenvector Centrality

1 basuki_btp 1

2 susipudjiastuti 0,997694

3 jokowi 0,878031

4 ramlirizal 0,707369

5 kpk_ri 0,650982

6 temanahok 0,47271

7 kompascom 0,466773

8 rudolfdethu 0,43266

9 geetnotgood 0,418927

10 detikcom 0,362747

11 reiza_patters 0,345125

12 gendovara 0,329106

13 ilc_tvonenews 0,322044

14 metro_tv 0,268044

15 indrajpiliang 0,227413

Through eigenvector centrality calculationsthat can be seen in Table 3, it is known thattwitter users with a username @basuki_btp isa node that has the highest value ofeigenvector centrality among the other nodes.The next rank is a Twitter user with theusername @susipudjiastuti eigenvectorcentrality that has a value of 0.9976494,@jokowi with a value of 0.878031, @ramlirizalwith a value of 0.707369 and 0.650982@kpk_ri value.

The first four nodes are a twitter accountbelonging to public officials. This means thatthe opinions grudges directed by Twitter usersto the accounts of the government, either inthe form of criticism or suggestions.

News portal @kompascom become influentialactors in seventh with eigenvector centralityvalue of 0.466773. Then @detikcom ranked10th with a value of 0.362747, and @metro_tvranked 14th with a value of 0.268044.

Conclusions

• Sentiment analysis performed on the data related to the reclamation issue in twitter using NaïveBayes classification, we get 97.42% accuracy, so that it can be said that the classification modelworks well and reliably. Of 23,115 test data, the model successfully classify 14,981 or 65% of thetext data as negative and 8,134 or 35% of text data as positive. It means, in conversations relatedto the reclamation issue, counter-reclamation tweets are more dominant than positive or pro-reclamation.

• From the network analysis, we know that the dominant interaction is nodes with sentimentexpressing their sentiment toward nodes with undefined sentiment. By taking the largestcommunities in the network, it is known that the community is composed of nodes that havedifferent sentiments. Influential actor detection was giving a result that the first four nodes is aTwitter account belonging to public officials. This means that the opinions grudges directed byTwitter users to the accounts of the government, either in the form of criticism or suggestions.

• As conclusion, the hybrid method proven relevant in analyzing social opinion polarization in morecomprehensive way than just using structural or content approach alone. Comprehensive in termof the information richness such as inter and intra group sentiment interactions, theheterogeneity or homogeneity group sentiment, and the possibility to measure the dynamics ofnetwork sentiments over time. This method can be used in any case that show sign of opinionpolarization. Once we have enough evidence that there is conflicting interest, then we can applythe method.

THANK YOU


Recommended