+ All Categories
Home > Documents > Mobile Social Media Mining Challenges Overview: A Case ... · PDF fileCase Study of WeChat...

Mobile Social Media Mining Challenges Overview: A Case ... · PDF fileCase Study of WeChat...

Date post: 11-Feb-2018
Category:
Upload: trinhmien
View: 216 times
Download: 1 times
Share this document with a friend
5
Mobile Social Media Mining Challenges Overview: A Case Study of WeChat Thomas E. Epalle School of Mathematics, Physics and Information Engineering Zhejiang Normal University, Jinhua, China [email protected] Abstract Data mining is likely to be considered a much valuable tool especially when applied to social media. Social media mining is the process of representing, analyzing and extracting useful patterns from data in social media, resulting from social interactions. Since its release in 2011 WeChat quickly became the most fast-growing networking mobile app in China. People of similar interests and backgrounds meet and cooperate using this social network, enabling them to share information flexibly and globally. Today just like Facebook and Twitter, WeChat contains millions of unprocessed raw data. The mined information from this mobile social media can considerably impact business strategy of any business enterprise. However, for the general public, mining WeChat unlike Facebook and Twitter remains a very complex task. This paper identifies and analyzes some challenges engineers and researchers may face if they want to mine this rich user-generated contains from WeChat database. Keywords: Mobile Social Media; WeChat; Challenges; Data Mining; Opinion Mining; 1 Introduction This era is known to be the big data age. Hundreds of millions of people all over the world are spending countless hours to connect, interact, communicate and share data using social media. Social media is now one of the biggest repositories of big data. With this big data comes an unprecedented opportunity and potential for data mining research. Mining data in social media is the process of collecting, searching, and analyzing social media structure as well as the large amount of user-generated data so as to discover useful patterns and relationships. This new research field has attracted researchers from different backgrounds such as computer science, social sciences, data mining, machine learning, nat- ural language processing, text mining, social media analysis and information retrieval. Because social media data differs from data used in classical data mining, new methods are needed to explore and an- alyze this unparalleled source of data. Many definitions of social media have been pro- posed in the literature[1][2][3]. Some definitions are more abstract than others like the one proposed by the authors of [2] who define social media as a set whose elements are social atoms (individuals), en- tity and interactions. Kaplan and Haenlein propose a more concise definition. They consider social me- dia as the group of internet-based applications that build on the ideological and technological founda- tions of web 2.0, and that allow the creation and exchange of user generated content[4]. There are many types of social media. Social me- dia classification and characterization varies from au- thor to author in the rich social media literature[1][2]. [4] classify social media sites into blogs, content communities, collaborative projects, virtual social worlds, virtual game works and social networking. The authors of [2] give another but interesting classi- fication of social media. The table below summarizes their classification and lists some real examples: Type of social media Examples Social networking Facebook, LinkedIn Microblogging Twitter, Weibo Photo sharing Flickr, Photobucket, Picasa News aggregation Google Reader, StumbleUpon, Feedburner Video sharing Youtube, Youku, MetaCafe Livecasting Ustream Justin.TV Virtual worlds Kaneva Social gaming World of Warcraft Social search Google, Bing, Ask Google talk, Skype, Instant messaging Yahoo!Messenger, WeChat, QQ, Bigo, Whatsapp Table 1: Types of social media Mobile social media are those providing an app we can use both on our computer and our Android, iPhone, BlackBerry, Windows Phone and Symbian. Thomas E Epalle, Int.J.Computer Technology & Applications,Vol 6 (2),347-351 IJCTA | Mar-Apr 2015 Available [email protected] 347 ISSN:2229-6093
Transcript

Mobile Social Media Mining Challenges Overview: A

Case Study of WeChat

Thomas E. Epalle

School of Mathematics, Physics and Information Engineering

Zhejiang Normal University, Jinhua, China

[email protected]

Abstract

Data mining is likely to be considered a muchvaluable tool especially when applied to social media.Social media mining is the process of representing,analyzing and extracting useful patterns from datain social media, resulting from social interactions.Since its release in 2011 WeChat quickly became themost fast-growing networking mobile app in China.People of similar interests and backgrounds meetand cooperate using this social network, enablingthem to share information flexibly and globally.Today just like Facebook and Twitter, WeChatcontains millions of unprocessed raw data. Themined information from this mobile social media canconsiderably impact business strategy of any businessenterprise. However, for the general public, miningWeChat unlike Facebook and Twitter remains a verycomplex task. This paper identifies and analyzessome challenges engineers and researchers mayface if they want to mine this rich user-generatedcontains from WeChat database.

Keywords: Mobile Social Media; WeChat;Challenges; Data Mining; Opinion Mining;

1 Introduction

This era is known to be the big data age. Hundredsof millions of people all over the world are spendingcountless hours to connect, interact, communicateand share data using social media. Social media isnow one of the biggest repositories of big data. Withthis big data comes an unprecedented opportunityand potential for data mining research. Mining datain social media is the process of collecting, searching,and analyzing social media structure as well as thelarge amount of user-generated data so as to discoveruseful patterns and relationships.

This new research field has attracted researchersfrom different backgrounds such as computer science,social sciences, data mining, machine learning, nat-ural language processing, text mining, social mediaanalysis and information retrieval. Because social

media data differs from data used in classical datamining, new methods are needed to explore and an-alyze this unparalleled source of data.

Many definitions of social media have been pro-posed in the literature[1][2][3]. Some definitions aremore abstract than others like the one proposed bythe authors of [2] who define social media as a setwhose elements are social atoms (individuals), en-tity and interactions. Kaplan and Haenlein proposea more concise definition. They consider social me-dia as the group of internet-based applications thatbuild on the ideological and technological founda-tions of web 2.0, and that allow the creation andexchange of user generated content[4].

There are many types of social media. Social me-dia classification and characterization varies from au-thor to author in the rich social media literature[1][2].[4] classify social media sites into blogs, contentcommunities, collaborative projects, virtual socialworlds, virtual game works and social networking.The authors of [2] give another but interesting classi-fication of social media. The table below summarizestheir classification and lists some real examples:

Type of social media ExamplesSocial networking Facebook, LinkedIn

Microblogging Twitter, WeiboPhoto sharing Flickr, Photobucket, Picasa

News aggregation Google Reader,StumbleUpon, Feedburner

Video sharing Youtube, Youku, MetaCafeLivecasting Ustream Justin.TV

Virtual worlds KanevaSocial gaming World of WarcraftSocial search Google, Bing, Ask

Google talk, Skype,Instant messaging Yahoo!Messenger, WeChat,

QQ, Bigo, WhatsappTable 1: Types of social media

Mobile social media are those providing an appwe can use both on our computer and our Android,iPhone, BlackBerry, Windows Phone and Symbian.

Thomas E Epalle, Int.J.Computer Technology & Applications,Vol 6 (2),347-351

IJCTA | Mar-Apr 2015 Available [email protected]

347

ISSN:2229-6093

In this paper we consider the challenges data miningresearchers and engineers have to overcome in otherto mine their vast amount of user generated data.WeChat is used in a case study approach with thehypothesis that its mining challenges can be easilygeneralized to other mobile social media apps.

2 What is WeChat ?

WeChat (微信, Weixin) which literally means mi-cro message is the mostly used mobile voice and textmessaging stand alone app in China. It was firstreleased in January 2011 by Tencent company. Byaugust 2014 WeChat already had 438 000 000 ac-tive users with 70 000 000 users outside of China.WeChat app is available for Android, iPhone, Black-Berry, Windows Phone and Symbian phones. Thereare also web based clients, but the user must havethe app installed on a supported mobile phone foridentification by scanning a QR code. WeChat reg-istration is done with the phone numbers or withFacebook account. This app provides text messag-ing, hold-to-talk voice messaging, broadcast messag-ing, sharing of pictures and videos, and location shar-ing. It can exchange contacts with people nearby viaBluetooth, as well as providing various features forcontacting people at random if desired (if these peo-ple are open to it) and integration with social net-working services such as those run by Facebook andTencent QQ. Photographs may also be embellishedwith filters and captions, and a machine translationservice is also available.

Thus it is obvious that WeChat is a rich warehousefor data mining engineering and research. The net-work structure, relationships, groups, communitiesand users-generated contain (text, video file, audiofiles and pictures) both can be used for mining. Thescope of this work is limited to network structuremining and text mining (facts as well as opinions).The relevance of studying WeChat mining challengesas an instance of mobile social media challenges canbe explained by the fact that this app factors thecommon characteristics of other mobile social net-working and instant messages sites with the sameproperties such as identity, conversation, sharing,presence, relationship and groups.

3 WeChat mining challenges:

Unlike Twitter and Facebook mining on whichmuch research and technical works have focused inthese last few years[5], WeChat mining has obtainedvery little attention from the research community.

Though mining WeChat has a lot of premises, ac-cess to this rich mine of data is obstructed by somechallenges we want to address in the remainder of

this paper.

3.1 Data extraction and pattern eval-uation challenges

Social media mining is an emerging field which hasmore problems than ready solutions. One of thebasic steps in social media mining is data extrac-tion. The most commonly used method to collectdata from social media is via application program-ming interfaces (APIs). Available API for WeChatcan be obtained at http://developers.WeChat.

com/WeChatapi. This available API allows a pro-grammer to develop an app that can send messagesto WeChat users either in their message boxes or intheir WeChat moments. This API has two versions:iPhone version and android version.

For now there is no publicly available API that canhelp collecting user generated data from WeChat.This lack of open API may be the most challeng-ing issue if we want to mine WeChat. One way ofgetting rid of this challenge is to develop an API forthis purpose. Programmers who are interested in thisshould learn how to use WeChat SDK platform andthe XCode environment to provide a free API to thegeneral public. The new API should allow users tocollect data from remote WeChat database servers toa computer for mining purposes. The present limitedAPI works only on mobile devices whose computa-tional power is low for running mining algorithms onbig data. Still there may be other troubles arisingfrom internet legal issues in China (see section 3.5).Even if this major API issue should be solved thereremain other data extraction challenges:

• What has been called the big data paradox:While mobile social media data is undoubtedlybig, however information about particular entitycan be very little.

• Sampling problems: Without knowing in ad-vance WeChat population statistical distribu-tion, how can we be sure that the sample isrepresentative of the global data? Consequentlyit becomes difficult to make sure that the find-ings obtained from mining are indicators of truepatterns profitable to business development orresearch.

• Mining results evaluation challenges: In classi-cal data mining data sets are used as trainingand test sets. In the case of social media howcan one construct these training and test sets?Evaluating patterns in mobile social media min-ing seems to be nearly insurmountable

Thomas E Epalle, Int.J.Computer Technology & Applications,Vol 6 (2),347-351

IJCTA | Mar-Apr 2015 Available [email protected]

348

ISSN:2229-6093

3.2 Mobile social media analysis chal-lenges

Mobile social media practically has a classical so-cial media structure. Being networks through whichindividuals can be connected through special linksthey are generally modeled using graph represen-tations. Such networks are represented by graphswhose nodes are peoples and whose vertices char-acterize links between people. Many graphs mod-els for social media analysis have been proposed[2],among which the little world model is the most used.Mining social media structure is an important aspectof social media mining. Unfortunately the methodsused generally to carry out data mining are poorlyadapted to the graph structure of social network.This graph structure makes data mining algorithmsineffective when mining social media due to their sizeand dynamic properties.

Let us illustrate this point by considering for in-stance some group detection algorithms in social me-dia:

• Graph partitioning by minimum cuts [6]: Thisalgorithm can only be effective if the size of thegraph is known in advance.

• Hierarchical clustering[7]: This method requiresa high cost in time (O(n2 log n)) and is notadapted to structures that are not basically hi-erarchical.

• K-means clustering[8]: Like for other graph par-titioning algorithms, the number of partitionsmust be set in advance.

• The spectral clustering algorithm which usesgraph matrix representations and their eigenval-ues to define clusters[9]. In this case also the sizeof the graph must be constant.

Moreover these dynamic properties of social network-ing graphs have not yet been considerably addressedin literature.

3.3 Natural language processing chal-lenges

Text messages or tweets are the most commoncommunicaton method used by WeChat users in Tai-wan, India, Hong-Kong , USA, Thailand, Indone-sia and of course Mainland China. With more than70 million users outside China mainland, WeChattweets are undoubtedly multilingual. WeChat textmessages use many languages including simplifiedand traditional Chinese, English, french, spanish andother languages. Language identification in tweets isa particular problem, due to their short length, andthe use of language independent tokens: hashtags,@mentions, numbers, URLs, emoticons. This are

new challenges for text mining and natural languageprocessing. Therefore we should employ proper lan-guage identification tools in mining textual data fromWeChat.

3.4 Opinion mining and sentimentanalysis challenges

One of the most interesting areas of social me-dia mining is opinion mining or sentiment analy-sis. The expressions sentiments analysis and opin-ion mining are generally synonymously used in theliterature[10]. This field of study analyzes peo-ple’s opinions, sentiments, evaluations, appraisals,attitudes, and emotions towards entities such asproducts, services, organizations, individuals, issues,events, topics, and their attributes. In recent years alarge amount of research has focus on opinion miningfrom the web in general like in discussion forums, re-view sites, blogs and e-commerce sites[11][1][12][13],and from social media in particular[14][15]. The pur-pose of opinion mining and sentiment analysis is topredict, for example, customer’s preferences for aspecific product which is valuable for economic ormarketing research. The purpose of opinion miningfrom social media can also be to make a summeriza-tion of opinions concerning a specific entity (event,product, policy, persons). The social media is fullof opinionated text. The challenges faced by opinionmining in social media are inherited from the chal-lenges faced by natural language processing and textmining in this area.

There are many challenges in applying traditionalopinion and sentiment analysis techniques to mobilesocial media like WeChat[14][16]. Tweets do notalways follow a conversation thread leading to thelost of contextual information and ambiguity. Someworks, [17] for example, have tried to reduce thiscontextual ambiguity using n-grams but this issueremains a great challenge. Irony and sarcasm in ex-pressing opinion in mobile social media are difficultfor a machine to detect.

Most opinion mining techniques in social mediamake use of machine learning techniques. Classifierslike support vector machines (SVM) and naive Bayesbuilt using supervised methods perform well on senti-ment polarity detection, but when applied in new do-mains, their accuracy reduces drastically[18][19][20].

3.5 China internet censorship chal-lenges

China is one of the countries that practice internetcontrol and censorship. Methods of internet controlinclude web contain regulation techniques that de-cide if some tweets should be present in social media(or any other website) or should be automatically

Thomas E Epalle, Int.J.Computer Technology & Applications,Vol 6 (2),347-351

IJCTA | Mar-Apr 2015 Available [email protected]

349

ISSN:2229-6093

deleted[21]. Other censorship techniques consist ofrestricting user behavior, physical network connec-tions, market orders and technological standards[22].These deletions and restrictions have a negative ef-fect on data integrity and are likely to bring a lot ofhardships in the process of data extraction.

Whether data in WeChat database belong to Ten-cent Company in Shenzhen or to the central gov-ernment of the People’s Republic of China, in eithercase, due to internet monitoring and control policyin the country, it may be illegal for Tencent Com-pany (and any other person or business) to developan API making WeChat massive data available tothe public for mining purposes[23], except under thestrict supervision of the government.

4 Conclusion

In this short paper we used the famous Chinesemobile social networking app WeChat as an instanceto illustrate a variety of challenges in mobile socialmedia data mining engineering and research. Mostof these challenges could be applied to other mobilesocial networking sites. There are legal challengesdepending on each country internet law and poli-cies. China is a special case where legal challengesapply with internet control and censorship laws inthis country. There are also technical challenges likethe development of new application programminginterfaces (API). This technical challenge requireslearning new programming languages, like Xcode forWeChat. Still there are challenges related to the sizeand the dynamic characteristics of mobile social me-dia graphs which cause social media structure miningalgorithms to work poorly. Other challenges concernthe fields of natural language processing and opin-ion mining from social media where obtaining ad-equate samples and applying proper evaluation tomining results seem insurmountable. More futureresearch could be dedicated to address each of thesechallenges.

References

[1] Pippal Sanjeev, Batra Lakshay, Krishna Akhila,Gupta Hina, and Arora Kunal. Data mining in socialnetworking sites: A social media mining approach togenerate effective business strategies. Internationaljournal of Innovations and Advancement in Com-puter Science Volume 3, Issue 2, pages 22–27, April2014.

[2] Zafarani Reza, Ali Abbasi Mohammad, and LiuHuang. Social media mining: an introduction. Cam-bridge university press, 2014.

[3] boyd Danah M. and Ellison Nicole B. Social networksites: definition, history, and scholarship. Journal

of Computer-Mediated Communication 13, No. 1,pages 59–68, 2010.

[4] Kaplan Andreas M. and Haenlein Michael. Users ofthe world, unite! the challenges and opportunitiesof social media. Business horizons 53, No. 1, pages210–230, 2007.

[5] Zafarani Reza, Abbasi Mohammad Ali, and LiuHuang. Mining the Social Web Analyzing Data fromFacebook, Twitter, LinkedIn, and Other Social Me-dia Sites. O’Reilly Media, 2011.

[6] Nagamochi Hiroshi. Algorithms for the minimumpartitioning problems in graphs. Electronics andCommunications in Japan, Part 3, Vol. 90, No. 10,Translated from Denshi Joho Tsushin Gakkai Ron-bunshi, Vol. J86-D-I, No. 2, February 2003, pp. 53-68, pages 63–78, 2007.

[7] Jonyer Istvan, Cook Diane J., and HolderLawrence B. Graph-based hierarchical conceptualclustering. Journal of Machine Learning Research,pages 19–43, February 2001.

[8] Galluccio Laurent, Michel Olivier, Comon Pierre,and Hero Alfred O. Graph based k-means clustering.Signal Processing Volume 92, Issue 9, pages 1970–1984, September 2012.

[9] Nascimento Maria C.V. and de Carvalho An-dre C.P.L.F. Spectral methods for graph clustering:A survey. European Journal of Operational ResearchVolume 211, Issue 2, pages 221–231, June 2011.

[10] Liu Bing. Sentiment Analysis and Opinion Mining.Morgan Claypool Publishers, 2012.

[11] Vinodhini G. and Chandrasekaran RM. Sentimentanalysis and opinion mining: a survey. Interna-tional Journal of Advanced Research in ComputerScience and Software Engineering, pages 282–292,June 2012.

[12] Todi Aditi, Agrawal Anahita, Taparia Ankit,Lakhmani Nikhlesh, and Shettar Rajashree. Anopinion-tree based flexible opinion mining model.International Journal of Engineering Science andAdvanced technology, Volume-2, Issue-3, pages 550–554, 2012.

[13] Siddiki Ahmad Tasnim and Aljahdali Sultan. Webmining techniques in e-commerce applications. In-ternational Journal of Computer Applications Vol-ume 69, No.8, pages 39–43, May 2013.

[14] Karamibekr Mostafa and Ghorbani Ali A. A struc-ture for opinion mining in social domains. Social-Com/PASSAT/BigData/EconCom/BioMedCom,pages 264–271, 2013 IEEE.

[15] Maynard Diana, Bontcheva Kalina, and Rout Do-minic. Challenges in developping opinion miningtools for social media.

[16] Rahmath P. Haseena. Opinion mining ans sentimentanalysis: challenges and applications. InternationalJournal of Application or Inovation in Engineeringand Management, Volume 3, Issue 5, pages 401–403,May 2014.

Thomas E Epalle, Int.J.Computer Technology & Applications,Vol 6 (2),347-351

IJCTA | Mar-Apr 2015 Available [email protected]

350

ISSN:2229-6093

[17] Shelke Nilesh M., Deshpande Shriniwas, and ThakreVilas. Survey of techniques for opinion mining. In-ternational Journal of Computer Applications, Vol-ume 57, No. 13, pages 30–35, November 2012.

[18] Khairnar Jayashri and Kinikar Mayura. Machinelearning algorithms for opinion mining and senti-ment classification. International Journal of Scien-tific and Research Publication, Volume 3, Issue 6,June 2013.

[19] Singh Pravesh Kumar and Husain Mohd Shahid.Methodological study of opinion mining and senti-ment analysis techniques. International Journal onSoft Computing, Vol.5, No. 1, pages 11–21, Febru-ary 2014.

[20] Buche Arti, Chandak M.B., and Zadgaonkar Ak-shay. Opinion mining and analysis: a survey. Inter-national Journal on Natural Language Computing,Volume 2, No. 3, pages 39–47, June 2013.

[21] D. Bamman, B. O’Connor, and Smith N. Censorshipand deletion practices in chinese social media. FirstMonday Online 17(3), 2012.

[22] Li Xiaoyu and Robbin Alice. How china regulatesonline content: a policy evolution framework. IADISInternational Journal on WWW/Internet Vol. 11,No. 3, pages 35–45, 2012.

[23] Khanna Rohan, Dhingra Vikram, and ChoudharyKavita. Internet censorship: Freedom vs secu-rity. International Journal of Computer Trends andTechnology(IJCTT), volume 4 Issue 8, pages 2695–2698, August 2013.

Thomas E Epalle, Int.J.Computer Technology & Applications,Vol 6 (2),347-351

IJCTA | Mar-Apr 2015 Available [email protected]

351

ISSN:2229-6093


Recommended