http://www.iaeme.com/IJMET/index.asp 274 [email protected]
International Journal of Mechanical Engineering and Technology (IJMET)
Volume 8, Issue 12, December 2017, pp. 274–289, Article ID: IJMET_08_12_027
Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=12
ISSN Print: 0976-6340 and ISSN Online: 0976-6359
© IAEME Publication Scopus Indexed
OPINION MINING ANALYSIS IN BANKING
SYSTEM USING ROUGH FEATURE SELECTION
TECHNIQUE FROM SOCIAL MEDIA TEXT
N.Sumathi
Research Scholar,(Part-time Ph.D Category-B), Department of Computer Science,
Research & Development Centre, Bharathiar University, Coimbatore, India
Dr.T.Sheela
Research Supervisor, Dean Networking, HOD/IT,
Sri Sai Ram Engineering College, Chennai, India
ABSTRACT
Information technologies and social media have strongly touched into our daily
life and it is not able to think of our life without Internet and social media. Social
media is a strongest option for sharing peoples feeling, thoughts and opinions about
various sectors. Some part of such sharing data is collectively for an opinionated,
which can be used to analysis and support for making decision in organization. With
more or less three billion individuals pick up the social media tab everywhere in the
world, banks ought to consider seriously how to improving our customer service, risk
factor of banking by arranging an information which is gathered by on social media.
Banks have practice social media awareness to generate pricing paradigms for loans
and another banking products, and they can bring together traditional scoring milieus
with available information in the public social domain, such as Facebook, Twitter,
and other social networking sites. In this paper, from the social media, using the
sentiment analysis techniques is the method of extracting particular subjective
information from various types of data. Opinion mining is one of the techniques it
assists you to recognize several subjective opinions of bank. It helps to solves where
manual analysis break down. This research made of mounting source of texts from
daily observations on social media discussions with growing complexity of text
sources and topics, so it need to re-examine the typical sentiment feature extraction
approaches. In this paper proposed a sentiment index to extract the feature by using
rough set method, it helpful for making information system for classifying the bank
loan price analyzing, risk analyzing through social media and creating a strategies to
improve wherever fails.
Keywords: Sentiment analysis, Rough set, Feature selection, classification, Bank
dataset, data preprocessing
N.Sumathi and Dr.T.Sheela
http://www.iaeme.com/IJMET/index.asp 275 [email protected]
Cite this Article: N.Sumathi and Dr.T.Sheela, Opinion Mining Analysis in Banking
System Using Rough Feature Selection Technique from Social Media Text,
International Journal of Mechanical Engineering and Technology 8(12), 2017, pp.
274–289.
http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=12
1. INTRODUCTION
The main growing technique of computer is semantics linguistics that explores human
opinions is known as opinion mining or called as sentiment analysis. Opinion mining is one of
the methods of analyzing human thought about products and services; this method belongs to
the natural language processing. At the moment retrieval of opinions from various media
became very easier because personalities share their understandings about numerous topics
across social networks such as Facebook, Twitter and etc., Study of opinions mining is
whatever the people put their opinions and reviews as regards products on social. For
example, in banking, if the bankers want know about the customer's satisfaction of loan
prices, credit card prices and interest on particular scheme.
In the modern years a growing amount of individuals are fixed through an internet so the
flowing information is always increasing. Publics in all over the world post huge volume of
opinionated information in various formats to various media services on the WWW. Some of
the organization teams have immense interest in what customer‘s expression and opinion
regarding their products. Computerizing of this process is helpful to know the customers
thought and customer satisfaction. This is motivated to do the research in sentiment analysis.
Users will rapidly increasing on banks to use social media to deliver faster and more
effective customer service and financial advice. By analyzing the great numbers of data
available on social media such as twitter, banks need to extract some key features that enable
them to improve services of products, marketing, customer service, business performance and
risk management. Since social media have to all concerning the customer experience and
feelings, banks need to assemble their social media strategies about the customer to
determination of loyalty, revenue and profitability.
The social media change the leads to increasing competition in the market; banks need to
be aware of their customer opinion on social media to know brand‘s position and reputation.
Opinions of people are also essential as input to marketing their products. The negative
sentiment, a bank should be able to take action quickly, but for that to happen opinion
detection techniques are needed. A way to do this is thought for making sentiment analysis
system for about the bank.
Social media could be surrounded into a bank‘s thorough network because it influences
several areas, such as CRM (customer relationship management), product and service design,
risk management, etc. Banks can go over their services and product processes once they
collect feedback from their customers and need to get changes based on suggestions
represented from social data. Furthermore, banks will need to inaugurate key performance
indicators to determine their success as they build improvement in the social media journey.
Figure 1 shows the social media key indicators for bank.
In this paper, we present a new method of opinion mining for analyzing an opinion about
risk in textual discoveries by banks. In this work we finds appropriate data set from web,
extract the appropriate keywords for risk and loan price opinion analysis, and analyze risk and
loan prices sentiment within the order of year. The obtained sentiment index scores will be
put a figure on uncertainty, positivity and negativity. Uncertainty narrates to ―uncertainties
resulting in adverse variations of profitability or in losses‖ (Bessis, 2002, p. 11). Negative
opinion analyzes the current or future problems, and positive opinion might characterize
Opinion Mining Analysis in Banking System Using Rough Feature Selection Technique from Social
Media Text
http://www.iaeme.com/IJMET/index.asp 276 [email protected]
overconfidence. We can discover the sentiment index scores reflect financial events rather
than other major economic crisis within bank on last decade. Additionally test for correlations
between the quantitative risk indicator and sentiment index scores.
Figure 1 Key Performance Indicators
The rest of the paper has four more chapters and ordered as follows. Chapter 2: Describes
different techniques applied in opinion mining. The literature review includes discussion on
lexicon-based approaches. Chapter 3: Gives information about models and proposed
approach, this chapter consist rough set feature selection on sentiment analysis. Additionally
feature selection methods are presented. Next chapter discussed a data set used for training
and testing, presents experiments and results achieved in this research work. Conclusion and
future research are discussed in last chapter.
2. LITERATURE REVIEW
Wide-ranging research effort has been achieved upon different types of networking social
media channels such as dynamic networks upon individuals (twitter, etc.), and online virtual
communities [2, 4, 16, 34, 41–43]. One discoverer work from the Asavathiratham at MIT in
1996 [2] developed a model for tractable representation of networked Markov chains.
Concerning network dynamics of online virtual societies and communities, [16] proposed a
method used for various interesting computations on a social network. Text x or keyword
classification based on its sensitive polarization has become a lately-emerged boundary
attractive to the web mining. To demonstrate how it works, you should use a search engine
online for finding some new location or text such as Google, and shoot the query keyword. It
would be accessible to know what fraction of the matches Google returns recommends upon
the text keyword as a travel destination [18]. Including sentiment analysis into search engine
and text retrieval mechanisms empowers a more effective and functional service for network
users [45]. Sentiment analysis has been consumed in various applications such as news
N.Sumathi and Dr.T.Sheela
http://www.iaeme.com/IJMET/index.asp 277 [email protected]
tracking, online forums, chatting rooms, blogging etc. YouTube initiated sentiment
classification techniques to sort out all its comments into ―Poor‖ or ―Good‖ [44]. As a
suggesting research area, text sentiment analysis has been greatly studied [1,3,26–28,33,35],
where sentiment analysis is used for text classification tasks [8,13,14,40]. Prevailing
sentiment analysis approaches divide into two types as machine learning [3,33] and semantic
oriented approaches [1,26–28,33,35]. Languages that have been studied include English
[3,13,26–28], Chinese [33,35] and Arabic [1].
In 2015, (Øye, et al) [45] he proposed sentiment analysis on Norwegian Twitter messages.
The aim of his work was to carry out typical sentiment analysis on three distinct datasets
gotten from Twitter social media. The datasets were Norwegian general tweets, one about the
prime minister of Norwegian, Erna Solberg, and another set of tweets about the Rosenborg, it
is football team of Norwegian. Øye presented the analysis based on the two-step approach as
described by (Pang and Lee, 2004) and found a precision up to 80% on the polarity
classification, and up to 76% when merging subjectivity detection with polarity classification.
The author Wei and Gulla, 2010 [46], presented a paper illustrating an approach to
sentiment learning aided by sentiment ontology tree (SOT) structures. The SOT consists
various features, and sub-features, with a root entity, systematized in a hierarchical style. A
hierarchical learning algorithm was applied to threshold- and weight-vectors for analyzing the
problem. According this paper we know the how to extract the feature from social media but
this feature extraction organized on tree structure and it some semantic meaning identification
lacked so we need to use efficient techniques to retrieve features.
Vidya et al., 2015 [47] used Twitter data to develop a sentiment analysis system for
various mobile phone providers in Indonesia. The motivation of this work was to analyses
brand reputation by extracting sentiments with regard to five products: internet services, voice
call, Short Service Message (SMS), 3G and 4G. A metric, net brand reputation (NBR), used
to compute customer satisfaction per service. The NBR metric was matched to the Net
promoter score (NPS) which is commonly used in computing customer satisfaction
(Satmetrix, 2017). This paper motivated to analyzing and finding the metric to determine the
brand reputation. According this paper we get several metrics on bank organization and
finding the reputation of their products.
David Hazarika [11], the social media domain is quickly developing. Banks and financial
institutions that are rapidly to synergize their business process with their social media
stratagems will be most quick to respond the customer needs and offer customers for best
experience. As such, social media can be show as a double-edged sword likely one side it can
raise the possibility of security and privacy threats of banks and their customers, while on the
other side, it most confidently will generate massive value. We conclude that social media
stratagems have become a command for the banking industry for how bank will need to get
on their own social journey to protect their place in the future.
Opinion Mining Analysis in Banking System Using Rough Feature Selection Technique from Social
Media Text
http://www.iaeme.com/IJMET/index.asp 278 [email protected]
3. MODELS AND METHODOLOGY
Figure 2 Black diagram for proposed method
Our proposed system is primarily combined of the following steps: data collection and
cleaning, giving sentiment index value, generating the information system using roughest and
analyzing the sentiment. Figure. 2 illustrates the diagram of our proposed method. There are
three modules are defined to integrating opinion mining, such as data collection, rough set
feature extraction and analyzing features attribute based result analyzed. First Module
describing the getting content and data cleaning based on text sentiment analysis. The second
module is lexicon-based approach, it helps to originates and analyzes positivity, negativity
and uncertainty text posted in twitter social media about bank products. In Third module is
extracting the feature from posting and compute the sentiment index value for each. Our
proposed system will grant an integer value for each feature, with the sign showing its
emotional polarity and the absolute value its emotional intensity.
3.1 Data Preprocessing
Unstructured text data in twitter have been noisy. Therefore, data cleaning is must to attain a
good output. The pre-processing of data has the subsequent steps, it consisting the following
techniques:
3.1.1. Removal of text without relevant to bank products
Consuming empty posting or posting with irrelevant information of particular products only
put in noise to the classification and extraction problem. So, their removal is a must task for
sentiment analysis.
N.Sumathi and Dr.T.Sheela
http://www.iaeme.com/IJMET/index.asp 279 [email protected]
3.1.2. Changeover to lower case
This step involves on eliminating conflict on the use of lower and upper cases. So, all the
posted text was transformed into lower case. This can make the easier use of lexicons for the
classification.
3.1.3. Removal of stopwords
Stopwords are functional words in language for connecting sentence, continue next posting
such as prepositions, conjunctions words in English. Examples include a, an, the, into, if, and,
or. This procedure helps to reduce the size of dataset and consist of necessary text for the
succeeding steps.
3.1.4 Special character removals
This step removes excessive whitespaces, punctuation characters, special symbol and
numbers.
3.1.5. Stemming
Describes stemming is a process of deleting suffixes and prefixes of posted data, after this
step dataset have root word or stem word. The premise is that words describe similar meaning
in text. For example: develop, develops, developed, developing, developers have a common
stem or root word develops.
3.2 Lexicon-based Approach
3.2.1 Sentiment Classifier
This work processing a sentiment analysis in a document level in others words, each text can
be categorized into positive, negative and uncertainty. Uncertainty group means the posted
data is nor good or bad about the bank product in twitter posted by customers. We consume
that each posted text refers to a single bank product such as loan or credit card. The following
algorithm have classified the user text as follows
1. Every text in dataset is classified into positive, negative and uncertainty
2. If the total number of positive lexicon words of products is greater than the total
number of negative words, we assume the products are categorized as positive.
3. If the total number of positive lexicon words of a products is lesser than the total
number of negative words, we assume the products is categorized as negative
4. If the words not categorized as positive or negative, the words categorized as
uncertainty.
Opinion Mining Analysis in Banking System Using Rough Feature Selection Technique from Social
Media Text
http://www.iaeme.com/IJMET/index.asp 280 [email protected]
Table 1 Examples for opinion words
Positive Good, Excellent, easy, Secure
Transaction
Negative Connection problem, less
Response, Poor
Uncertainty Average, Somewhat ok, not
good not bad
Table 1 lists out of some sentiment words in the bank money transfer risk. The customer
level sentiment index weight scores are computed for three sentiment lexicon classes, namely
uncertainty lexicon, positivity lexicon, and negativity lexicon.
∑
Li,j the local weight of text i in file j.
* ( )
global weight is the inverse document frequency
Where N denoted number of documents and di is the number of customers used text i at
least once
√∑ ( )
Algorithm 1 Text Sentiment Classification
procedure ClassifySentiment(Data)
Positives = 0
Negatives = 0
for every word T in data do
if Lexicon(T) is positive then
Positives = Positives + 1
else if Lexicon(T) is negative then
Negatives =Negatives + 1
end if
end for
if Positives > Negatives then
return ―positive about products‖
else if Positives < Negatives then
return ―negative about products‖
else
return ―Uncertainty (Average Products)
end if
end procedure
N.Sumathi and Dr.T.Sheela
http://www.iaeme.com/IJMET/index.asp 281 [email protected]
After computing the sentiment index weight, the customer posted data are filtered and
categorized in order to make ready them for the assessments. In particular, the data are
separated regarding to specific products and clustered by year separately by bank.
3.3 Rough set information system and feature extraction
3.3.1 Information System
Rough set theory supports rough calculation in decision-making, selecting an object or
attribute associated with knowledge signifying relative membership. A table represents
knowledge, so it has called an information system, where rows denote objects and columns
denote attributes. An information system S is pair of non-empty finite set of objects (U), as
the universe, and non-empty finite set of attributes (A)such as S=(U,A). Let X U and BA.
Let the set X consists only the data in B by constructing the lower and upper approximations
of X it denoted by X and X. Lower approximation defines the set of members that
certainly belong to a given class. Let X U, the B-lower approximation X of a set of
members X can be defined as{xU:[x]B X}. Upper approximation defines the set of
members that can probably belong to a given class. Let X U, the B-upper approximation
X of a set of members X can be defined as{xU:[x]BX ≠ Φ } [37].
Figure 3 Rough set Theory
The Figure.3 illustrates the sets of dark-gray spots which is denoted as lower
approximation X, while those of both dark-gray and light-gray spot together represent upper
approximation X.
The set X with respect to B can be described numerically as Rα = 1- . This means X is
crisp with respect to B if the set X is 0, and X is rough if Rα>0 then [38]. Let the universe U
is a set of pixels of an image. The universe U can be partitioned into a set of non- overlapping
windows (size m × n, say), each partitioned window considered as a granule G. A granule is a
bunch of pixels in the universe U called as indistinguishability. Thus, granulation involves
decomposition of completely set into parts [37, 41, 38].
In Rough Set Theory, a data set is represented as a table, where each row represents a text
posted by customer. Each column represents the sentiment lexicons such as positive, negative
and uncertainty [39, 40].This table is called an information system. The set of all elements is
known as the universe.
Opinion Mining Analysis in Banking System Using Rough Feature Selection Technique from Social
Media Text
http://www.iaeme.com/IJMET/index.asp 282 [email protected]
Consider a Universe U of elements. Formally an information system I is a quadruple
I = (U, A, V, ρ), Where
A is a non-empty finite set of attributes
V = UaєA Va is the set of attribute values of all attribute,
Where Va is the set of possible values of a
ρ: U X A → V is an information function, such that for every element x є U,
ρ (x, a) є Va is the value of attribute a for element x.
The information system can also be viewed as an Information table, where each element x
є U corresponds to a row, each attribute a є A corresponds to a column
3.3.2 Decision system
A decision system is used for minimizing the attribute which is used to retrieve the text, those
is called condition attributes. That is the set of feature attribute used to get exact results that
particular attribute called as decision attribute.
A very simple information system is shown in Table. 2. There are six cases or objects, and
two condition attributes
Table 2 An Example Information System.
Term Bank mission
Sentiment
Feature
lexicon
policy 37 +1
Financial 25 +1
Bank 36 0
That 8 0
Loan 85 +1
Wont 23 -1
Wanted 8 -1
Any one posted data file will contain only a subset of all individual terms, and the rows
corresponding to used terms and giving number of time it used by customer‘s. The third
column feature lexicons for sentiment analysis such as positive represent as +1, negative
represented as -1 and uncertainty represents as 0.
This posteriori knowledge is stated by one separated variable called decision variable; the
procedure is called as supervised learning. Information systems of this category are called
decision systems.
The customers posted a text about bank products on social media, they discussed single,
two or many features about a product. The classification of separated features from a massive
database of reviews is a difficult job. The information system is established for this system is
by using the benchmark dataset. The unstructured dataset preprocessed in module one for
removing unwanted text. After this step, the dataset consist only product features, sub features
and relevant opinion words. The information system is developed consists of objects which
are social media customers who are text about bank, the product features are known as
condition attributes and the decision attribute called as class label. The Table 3 tabulated the
information system for bank loan interest product and financial risk.
N.Sumathi and Dr.T.Sheela
http://www.iaeme.com/IJMET/index.asp 283 [email protected]
Table 3 Information table of Bank Text
Object Interest Pre pay Installment Services Response Service
charge Document Sentiment
O1 1 1 0 0 0 1 1 P
O2 1 1 0 0 0 1 1 P
O3 0 0 0 1 0 0 0 N
O4 0 1 1 1 0 0 0 P
O5 1 1 0 0 0 1 1 P
O6 0 0 0 0 1 1 0 N
O7 0 1 1 1 0 0 0 P
O8 1 0 1 0 1 0 0 N
O9 1 1 0 0 0 1 1 P
O10 0 0 1 1 0 1 1 P
O11 0 1 1 1 0 0 0 P
O12 0 1 1 1 0 0 0 P
O13 0 0 0 1 0 0 0 N
O14 1 1 0 0 0 1 1 P
O15 0 0 1 1 0 0 0 N
The Bank deal with market risks by hedging opposed to foreign exchange and interest rate
risk with the intention to shield its earnings and protect the economic value of its liabilities
and assets. Foreign exchange risk is virtually fully hedged. Interest rate risk obtaining from
differences between lending and funding is retained at a modest level. The Table 4 is an
information table on market risk.
Table 4 Information Table of Market risk
Foreign exchange Cross-currency Interest rate Credit spread Sentiment
X1 1 1 0 1 P
X2 1 1 0 0 N
X3 0 1 1 1 P
X4 1 1 0 0 N
X5 1 1 1 1 P
X6 1 0 0 1 N
X7 0 1 1 1 P
X8 1 1 0 1 P
X9 1 1 0 1 P
The information table value ‗1‘ represents presence and ‗0‘ represents the absence of the
feature in the posted text. The character ‗P‘ and ‗N‘ represents the Opinion Orientation
decision attribute value. An important theory of a rough set is to detect redundancies and
dependencies among the information features. Lower and upper approximations used to find
decision boundary based on the equivalence classes.
Opinion Mining Analysis in Banking System Using Rough Feature Selection Technique from Social
Media Text
http://www.iaeme.com/IJMET/index.asp 284 [email protected]
Table 5 Equivalence Class
Equivalence class(Loan) Equivalence
class(Risk)
{O1,O2,O5,O14} {X1,X8,X9}
{O3,O13} {X2,X4}
{O4,O7,O11,O12} {X3,X7}
{O5,O9} {X5}
{O6} {X6}
{07}
{09}
Regarding the information system, loan analysis has eight condition attributes and the risk
management have five condition attributes, based on equivalence class the attribute reduced
as four attributes for deciding sentiment positive or negative.
4. RESULTS AND EVALUATION
4.1. Dataset
In this system, to the executed sentiment analysis technique have need of a great amount of
text data. The purpose of this execution is to make a paradigm that exactly classifies any type
of bank text as negative (-1), uncertainty (0) and positive (1). Appropriate to generate an
effective model for this problem, all information is reviews of the bank product domain. The
all reviews written by bank customers and all posted text scripted in English language.
The data set fetched from Facebook and Twitter. Facebook pages occasionally consists a
review column as empty where individuals can drop reviews as blank. These posting are
treated as uncertainty sentiment. All Twitter posted text can be manually edited because of
there is no rating of posted text.
Figure 3 Dataset classification
From the dataset from the total of 653 posted texts 49% were positive, 42% were negative
and 9% were uncertainty. In figure 3, the average occurrence of text polarity class is shown.
It is easy to realize a negative text have more long unwanted text, comparing the other
sentiment analysis negative part have more data cleaning process.
0
2
4
6
8
10
12
14
Negative Uncertainty Positive
Average of Dataset Sentiment Class
Unknown
Preposition
Conjunction
Special Symbol
White spaces
Repeated word
Stemming
N.Sumathi and Dr.T.Sheela
http://www.iaeme.com/IJMET/index.asp 285 [email protected]
Feature selection is the procedure for choosing which minimal features attribute to
retrieve the sentiment text for analyzing the bank product. The aim of feature selection
method is to detect the subset of the features attribute in dataset, and it produce the exact
results for analyzing. The features attribute were grouped into the different classes showed in
table 5. Regarding this class, we analyzing the bank products.
4.2 Results and Evaluation
Precision is a technique to fetch the total amount relevant text bank among the total texts in
dataset. Recall is measures the number of text retrieved about products in total bank texts.
Figure 4 shows the how the classification accuracy of bank products such as loan prices
and marketing risk. Algorithms performed based on equivalence class table 5 with the 15 and
9 feature attribute sets and calculating the highest accuracy values and showed in table 6.
Figure 4, shows the Accuracy of feature selection in the bank loan prices and Market risk.
Table 6 Accuracy Measurement
Equivalence
class(Loan) Accuracy
Equivalence
class(Risk) Accuracy
{O1,O2,O5,O14} 0.75 {X1,X8,X9} 0.79
{O3,O13} 0.6 {X2,X4} 0.72
{O4,O7,O11,O12} 0.74 {X3,X7} 0.65
{O5,O9} 0.4 {X5} 0.4
{O6} 0.2 {X6} 0.3
{07} 0.2
{09} 0.1
Figure 4 Accuracy Measurements
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
F1 F2 F3 F4 F5 F6 F7
Loan prices
Market risk
Opinion Mining Analysis in Banking System Using Rough Feature Selection Technique from Social
Media Text
http://www.iaeme.com/IJMET/index.asp 286 [email protected]
Figure 5 Sentiment analysis of Loan product
Figure 6 Sentiment analysis of Market Risk
As the Figure 5 and Figure 6 has the measurement of Loan and market risk products based
on rough set feature selection. According the figure 4, 5, and 6, feature attribute Interest, Pre
pay, Service charge, Document gives the more accuracy in sentiment classification. The
feature Pre pay Services, Installment gives next accuracy in loan prices analysis. The features
Foreign exchange, Cross-currency, Credit spread gives the more accuracy in market risk
analysis.
5. CONCLUSION
In this paper, we have Rough set based feature selection approach is utilized for sentiment
analysis in bank. It is capable reducing decision attributes for analyzing the positive,
uncertainty and negative text, the text posted by bank customers to social media. We found
the exact result about loan pricing and market risk through the rough set theory method with
help of selecting minimal feature attribute.
In future, we use various classification methods for sentiment lexicons in bank with rough
set feature selection technique, and comparing another technique which is give the more
accuracy result.
0
0.2
0.4
0.6
0.8
1
Negative Uncertainty Positive
Evaluation of Loan
Precision
Recall
0
0.2
0.4
0.6
0.8
1
Negative Uncertainty Positive
Measurment of Market Risk
Precision
Recall
N.Sumathi and Dr.T.Sheela
http://www.iaeme.com/IJMET/index.asp 287 [email protected]
REFERENCES
[1] K. Ahmad, Y. Almas, Visualising sentiments in financial texts? Proceedings of the Ninth
International Conference on Information Visualisation (2005) 363–368.
[2] C. Asavathiratham, The Influence Model: A Tractable Representation for the Dynamics of
Networked Markov Chains, Dept. of EECS. 2000, MIT, Cambridge, 2000, p. 188.
[3] P. Chaovalit, L. Zhou, Movie review mining: a comparison between supervised and
unsupervised classification approaches, Proceedings of the 38th Hawaii International
Conference on System Sciences, 2005.
[4] K.W. Cheung, J.T. Kwok, M.H. Law, K.C. Tsui, Mining customer product ratings for
personalized marketing, Decision Support Systems 35 (2) (2003) 231–243.
[5] J. Coble, D. Cook, R. Rathi, L. Holder, Iterative structure discovery in graph-based data,
International Journal of Artificial Intelligence Techniques 1–2 (14) (2005) 101–124.
[6] M. Dash, H. Liu, Feature selection for classification, Intelligent Data Analysis 1 (3)
(1997) 131–156.
[7] C.C. Freifeld, K.D. Mandl, B.Y. Reis, J.S. Brownstein, HealthMap: global infectiou
disease monitoring through automated classification and visualization of internet media
reports, Journal of the American Medical Informatics Association 15 (2008) 150–157.
[8] J. Gaurav, A. Ginwala, Y.A. Aslandogan, An approach to text classification using
dimensionality reduction and combination of classifiers, Proceedings of the 2004 IEEE
International Conference on Information Reuse and Integration (2004) 564–569.
[9] Goswami, R.M. Jin, G. Agrawal, Fast and exact out-of-core k-means clustering, Fourth
IEEE International Conference on Data Mining (2004) 83–90. [10] V. Guralnik, G.
Karypis, A scalable algorithm for clustering protein sequences, Proc. Workshop Data
Mining in Bioinformatics (BIOKDD), 2001, pp. 73–80.
[10] K.F. Han, D. Baker, Recurring local sequence motifs in proteins, Journal of Molecular
Biology 251 (1) (1995) 176–187.
[11] David Hazarika, How Banks Can Use Social Media Analytics To Drive Business
Advantage, article,2010
[12] V. Hatzivassiloglou, K.R. McKeown, Predicting the semantic orientation of adjectives,
Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the
European Chapter of the ACL, New Brunswick, NJ, 1997, pp. 174–181.
[13] R.Q. Huang, J.H.L. Hansen, Dialect classification on printed text using perplexity
measure and conditional random fields, IEEE International Conference on Acoustics,
Speech and Signal Processing (2007) 993–996.
[14] T. Joachims, Text categorization with SVM: learning with many relevant features,
Proceedings of ECM, 10th European Conference on Machine Learning, 1998.
[15] J.I. Khan, S. Shaikh, Relationship algebra for computing in social networks and social
network based applications, 2006 IEEE/WIC/ACM International Conference on Web
Intelligence, 2006, pp. 113–116.
[16] N. Li, X. Liang, X. Li, C. Wang, D. Wu, Network environment and financial risk using
machine learning and sentiment analysis, Human and Ecological Risk Assessment (2009)
227–252.
[17] B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment classification using machine
learning techniques, Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 2002, pp. 79–86.
[18] T. Saegusa, T. Maruyama, Real-time segmentation of color images based on the K-means
Clustering on FPGA, International Conference on Field-Programmable Technology, 2007,
pp. 329–332.
Opinion Mining Analysis in Banking System Using Rough Feature Selection Technique from Social
Media Text
http://www.iaeme.com/IJMET/index.asp 288 [email protected]
[19] S. Schauland, A. Kummert, P. Su-Birm, I. Uri, Y. Zhang, Vision-based pedestrian
detection—improvement and verification of feature extraction methods and SVMbased
classification, IEEE Intelligent Transportation Systems Conference (2006) 97–102.
[20] Z.H. Sun, Y.X. Sun, Fuzzy support vector machine for regression estimation, IEEE
International Conference on Systems, Man and Cybernetics, vol. 4, 2003, pp. 3336–3341.
[21] S. Tan, J. Zhang, An empirical study of sentiment analysis for chinese documents, Expert
Systems with Applications 34 (4) (2008) 2622–2629. [23] D. Thanh-Nghi, J.D. Fekete,
Large scale classification with support vector machine algorithms, ICMLA 2007, Sixth
International Conference on Machine Learning and Applications, 2007, pp. 7–12.
[22] Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human
language technologies, 5(1), 1-167.
[23] Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and
applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.
[24] Pak, A., & Paroubek, P. (2010, May). Twitter as a Corpus for Sentiment Analysis and
Opinion Mining. In LREc (Vol. 10, No. 2010).
[25] Gokulakrishnan, B., Priyanthan, P., Ragavan, T., Prasath, N., & Perera, A. (2012,
December). Opinion mining and sentiment analysis on a twitter data stream. In Advances
in ICT for emerging regions (ICTer), 2012 International Conference on (pp. 182-188).
IEEE.
[26] Hallsmar, F., & Palm, J. (2016). Multi-class sentiment classification on twitter using an
emoji training heuristic.
[27] Salas-Zárate, M. D. P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H., Rodríguez-
García, M. Á., & Valencia-García, R. (2017). Sentiment Analysis on Tweets about
Diabetes: An Aspect-Level Approach. Computational and mathematical methods in
medicine, 2017.
[28] Chiavetta, F., Bosco, G. L., & Pilato, G. (2016). A Lexicon-based Approach for Sentiment
Classification of Amazon Books Reviews in Italian Language.
[29] Hailong, Z., Wenyan, G., & Bo, J. (2014, September). Machine learning and lexicon
based methods for sentiment classification: A survey. In Web Information System and
Application Conference (WISA), 2014 11th (pp. 262-265). IEEE.
[30] Dong, Z., Dong, Q., & Hao, C. (2010, August). Hownet and its computation of meaning.
In Proceedings of the 23rd International Conference on Computational Linguistics:
Demonstrations (pp. 53-56). Association for Computational Linguistics.
[31] Musto, C., Semeraro, G., & Polignano, M. (2014). A comparison of lexicon-based
approaches for sentiment analysis of microblog posts. Information Filtering and Retrieval,
59.
[32] Park, S., & Kim, Y. (2016, June). Building thesaurus lexicon using dictionary-based
approach for sentiment classification. In Software Engineering Research, Management
and Applications (SERA), 2016 IEEE 14th International Conference on (pp. 39-44).
IEEE.
[33] Ding, X., Liu, B., & Yu, P. S. (2008, February). A holistic lexicon-based approach to
opinion mining. In Proceedings of the 2008 international conference on web search and
data mining (pp. 231-240). ACM.
[34] Thakkar, H., & Patel, D. (2015). Approaches for sentiment analysis on twitter: A state-of-
art study. arXiv preprint arXiv:1512.01043.
[35] Zagibalov, T., & Carroll, J. (2008, August). Automatic seed word selection for
unsupervised sentiment classification of Chinese text. In Proceedings of the 22nd
International Conference on Computational Linguistics-Volume 1 (pp. 1073-1080).
Association for Computational Linguistics.
N.Sumathi and Dr.T.Sheela
http://www.iaeme.com/IJMET/index.asp 289 [email protected]
[36] Tang, B., Kay, S., & He, H. (2016). Toward optimal feature selection in naive Bayes for
text categorization. IEEE Transactions on Knowledge and Data Engineering, 28(9), 2508-
2521.
[37] Yiyuan Cheng, Ruiling Zhang, Xiufeng Wang, Qiushuang Chen. Text Feature Extraction
Based on Rough Set in Fifth International Conference on Fuzzy Systems and Knowledge
Discovery, 2008 IEEE, pp.310-314.
[38] Hsun-Hui Huang, Yau-Hwang Kuo and Horng-Chang Yang. Fuzzy-Rough Set Aided
Sentence Extraction Summarization in Proceedings of the First International Conference
on Innovative Computing, Information and Control (ICICIC'06).
[39] Qiang Li, Jjan-Hua Li, Gong- Shen liu, Sheng-Hong Li. A Rough Set based Hybrid
Feature Selection Method For Topic Specific Text Filtering in Proceedings of the Third
International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August
2004, pp.1464-1468.
[40] Richard Jensen and Qiang Shen . Semantics-Preserving Dimensionality eduction: Rough
and Fuzzy-Rough-Based Approaches in IEEE Transactions on Knowledge and Data
Engineering, Vol. 16, No. 12, December 2004,pp.1457-1471.
[41] Zdzisław Pawlak. Rough set theory and its applications in journal of Telecommunication
and Information Technology 3/2002, pp.7-10. .
[42] Chikersal, P., Poria, S., & Cambria, E. (2015, June). SeNTU: sentiment analysis of tweets
by combining a rule-based classifier with supervised learning. In Proceedings of the
International Workshop on Semantic Evaluation, SemEval (pp. 647-651).
[43] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval
(Vol. 1, No. 1, p. 496). Cambridge: Cambridge university press.
[44] Severyn, A., & Moschitti, A. (2015, August). Twitter sentiment analysis with deep
convolutional neural networks. In Proceedings of the 38th International ACM SIGIR
Conference on Research and Development in Information Retrieval (pp. 959-962). ACM.
[45] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed
representations of words and phrases and their compositionality. In Advances in neural
information processing systems (pp. 3111-3119).
[46] Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint
arXiv:1408.5882.
[47] Øye, J. A. (2015). Sentiment Analysis of Norwegian Twitter Messages. Master‘s thesis.
[48] Wei,W. and Gulla, J. A. (2010). Sentiment Learning on Product Reviews via Sentiment
Ontology Tree. In Proceedings of the 48th Annual Meeting of the Association for
Computational Linguistics, pages 404–413. Association for Computational Linguistics.
[49] Vidya, N. A., Fanany, M. I., and Budi, I. (2015). Twitter Sentiment to Analyze Net Brand
Reputation of Mobile Phone Providers. 72:519–526.
[50] Myneni Madhu Bala, M. Srinivasa Rao and M Ramesh Babu, Sentiment Trends on
Natural Disasters using Location Based Twitter Opinion Mining, International Journal of
Civil Engineering and Technology (IJCIET) Volume 8, Issue 8, August 2017, pp. 9-19
[51] Rashid Ali, Pro-Mining: Product Recommendation Using Web-Based Opinion Mining,
International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue
6, November - December (2013), pp. 299-313
[52] Sandip S. Patil and Asha P. Chaudhari, Classification of Emotions from Text Using Svm
Based Opinion Mining, International Journal of Computer Engineering & Technology
(IJCET), Volume 3, Issue 1, January- June (2012), pp. 330-338
[53] Dr. Jamshed Siddiqui, An Overview of Opinion Mining Techniques, International Journal
of Advanced Research in Engineering and Technology (IJARET), Volume 4, Issue 7,
November - December 2013, pp. 176-182