Home >Documents >Mining and Analysing Social Network in the Oil Business ... Business: Twitter Sentiment Analysis and...

Mining and Analysing Social Network in the Oil Business ... Business: Twitter Sentiment Analysis and...

Date post:11-Mar-2020
Category:
View:8 times
Download:0 times
Share this document with a friend
Transcript:
  • Mining and Analysing Social Network in the Oil

    Business: Twitter Sentiment Analysis and Prediction Approaches

    Hanaa Ali Aldahawi

    2015

    Cardiff University

    School of Computer Science and Informatics

    A thesis submitted in partial fulfilment of the

    requirement for the degree of Doctor of Philosophy

  • iii

    Declaration

    This work has not previously been accepted in substance for any degree and is not concurrently submitted in candidature for any degree.

    Signed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (candidate)

    Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Statement 1

    This thesis is being submitted in partial fulfilment of the requirements for the degree of PhD.

    Signed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (candidate)

    Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Statement 2

    This thesis is the result of my own independent work/investigation, except where otherwise stated. Other sources are acknowledged by explicit references.

    Signed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (candidate)

    Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Statement 3

    I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter- library loan, and for the title and summary to be made available to outside organisations.

    Signed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (candidate)

    Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  • iv

  • Dedication v

    To my parents

    With profound gratitude to Mum - I hope I made you proud. In loving memory of Dad; you would have been proud of me.

  • vi

  • vii

    Abstract

    Twitter is a rich source of data for opinion mining and sentiment analysis that companies can use to improve their strategy with the public and stakeholders. However, extracting and ana- lysing information from unstructured text remains a hard task. The aim of this research is to investigate the use of Twitter by “controversial” companies and other users. In particular, it looks at the nature of positive and negative sentiment towards oil companies and shows how this relates to cultural effects and the network structure. This has required the evaluation of existing automated methods for sentiment analysis and the development of improved methods based on user classification. The research showed that tweets about oil companies were noisy enough to affect the accuracy. In this thesis, we analysed data collected from Twitter and investigated the variance that arises from using an automated sentiment analysis tool versus crowd sourced human classification. Our particular interest lay in understanding how users’ motivation to post messages affected the accuracy of sentiment polarity. The dataset used Tweets originating from two of the world’s leading oil companies, BP America and Saudi Aramco, and other users that follow and mention them, representing Western and Middle Eastern countries respectively. Our results show that the two methods yield significantly different positive, natural and negative classifications depending on culture and the relationship of the poster of the tweet to the two companies. This motivated the investigation of the relationship between sentiment and user groups extracted by applying machine learning classifiers. Finally, clustering based on similar- ities in the network structure was used to connect user groups, and a novel technique to improve the sentiment accuracy was proposed. The analytical technique used here provided structured and valuable information for oil companies and has applications to other controversial domains.

  • viii

    Acknowledgements

    I would like to express my gratitude to my supervisors, Dr Stuart Allen and Professor Roger Whitaker for their support throughout this research. In particular, I would like to acknow- ledge my debt to Dr Allen, my main supervisor, and express to him my special appreciation and thanks. He has been a tremendous mentor for me, with his patience and knowledge. His guidance, advice and continual encouragement at all stages of my PhD helped me tackle the challenges of this research and shaped my ideas, keeping me on track and helping me become an independent researcher. I am much blessed at being under his supervision and I have learned a lot from him.

    I would like to extend my thanks to my sponsor in Saudi Arabia, King Abdul-Aziz University (KAU), for the scholarship and the continuous support throughout the years of my study in the UK. I am also thankful to the head of the Information Science Department and all the staff for their support, encouragement and friendship. Special thanks and appreciation also go to the UK Saudi Arabian Cultural Bureau for their help and support.

    I would like, too, to thank Dr Martin Chorley for his continuously helpful crowdsourcing expert- ise, which is an important part of this thesis. I am also grateful to Dr Matt Williams for his help in learning Python language in the early stages of this research. Thanks also to all members of the School of Computer Science and Informatics for their helpful discussions, comments, feedback, events and facilities. Special thanks to Dr Rob Davies, and Mrs Helen Williams for technical and administrative support.

    I am deeply grateful to my family who always encourage and support me. To my mum, who provided endless support, encouragements and love, to make me who I am today. Words cannot express how grateful I am to her for all the sacrifices that she has made for my sake. Her prayer for me has been what has sustained me thus far. To my brother Mohammed who was a good companion during all my study period in the UK and was very supportive and caring. To my sisters Wafaa, Rajaa and Asmaa, who believed in me. To my niece Hala and my nephews Ziyad and Yossef for their love. I am also thankful to my bigger family, my aunts and uncles, particularly uncle Mohammed and aunt Saleha. They all deserve my utmost thanks.

  • Acknowledgements ix

    My deep gratitude goes to my friends. To Liqaa Nwaf for her support, kindness and care during the difficult period of my PhD. I am very lucky to have found a big-hearted person like Liqaa to be my close friend. To Fatima Alrayes for her support and caring. To Shada Alsalamah and Haya Almagwashi for their support, feedback, and precious friendship. To all my colleagues who have been positive and supportive during the PhD journey.

    Finally, I am extremely grateful to all those who made my study journey easier and finally successful.

  • x

    Contents

    Abstract vii

    Acknowledgements viii

    Contents x

    List of Figures xiv

    List of Tables xvi

    List of Acronyms xix

    List of Publications xx

    1 Introduction 1

    1.1 Research Problem and Motivation . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Hypothesis and Research Questions . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.4 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.5 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Background and Literature Review 7

    2.1 Social Network Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

  • Contents xi

    2.1.2 Types of Social Network . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2 Twitter Analysis in Business . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.2.1 The Benefits of Social Networks to Business . . . . . . . . . . . . . . 12

    2.2.2 The Impact of Cultural Differences . . . . . . . . . . . . . . . . . . . 15

    2.3 Twitter and Data Analysis Techniques . . . . . . . . . . . . . . . . . . . . . . 15

    2.3.1 Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.3.2 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.3.3 Supervised and Unsupervised Machine Learning . . . . . . . . . . . . 20

    2.4 Evaluation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3 Twitter Usage by Oil Companies 27

    3.1 Overview of Primary Analysis Dataset Collection . . . . . . . . . . . . . . . . 28

    3.1.1 Tweet Rate over Time . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.1.2 Hashtags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.1.3 Hyperlinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.1.4 Retweets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.1.5 Mentions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    3.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    4 Sentiment Analysis 38

    4.1 Sentiment Analysis Techniques Used in This Work . . . . . . . . . . . . . . . 39

    4.1.1 Manual Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . 39

Click here to load reader

Reader Image
Embed Size (px)
Recommended