+ All Categories
Home > Documents > TISA: Topic Independence Scoring Algorithm

TISA: Topic Independence Scoring Algorithm

Date post: 14-Nov-2023
Category:
Upload: umbc
View: 0 times
Download: 0 times
Share this document with a friend
15
TISA: Topic Independence Scoring Algorithm Justin Martineau 1 , Doreen Cheng 1 , and Tim Finin 2 1 Samsung Information Systems North America 2 University of Maryland Baltimore County Abstract. Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, busi- ness intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain ap- proaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs. To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on aver- age 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uni- formly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic ar- eas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately clas- sify documents in never before seen topic areas. 1 Introduction Text analysis techniques, such as sentiment analysis, are valuable tools for busi- ness intelligence, predicting market trends, and targeting advertisements. This technology is especially salient because written works include tweets, Facebook posts, blog posts, news articles, forum comments, or any other sample of elec- tronic text that has become prevalent due to the grow of the web. Textual analysis applications need to operate over a wide and unpredictable array of topic areas, often in real-time. However, current approaches are unable to reliably and accurately operate in real-time for new domains. Text analysis on a wide array of topic areas is difficult because word meaning is context sensitive. Word sense disambiguation issues are one reason why clas- sifiers trained for one topic area do poorly in other topic areas. The linguistic community has spent a great deal of effort trying to understand the differences
Transcript

TISA: Topic Independence Scoring Algorithm

Justin Martineau1, Doreen Cheng1, and Tim Finin2

1 Samsung Information Systems North America2 University of Maryland Baltimore County

Abstract. Textual analysis using machine learning is in high demandfor a wide range of applications including recommender systems, busi-ness intelligence tools, and electronic personal assistants. Some of theseapplications need to operate over a wide and unpredictable array of topicareas, but current in-domain, domain adaptation, and multi-domain ap-proaches cannot adequately support this need, due to their low accuracyon topic areas that they are not trained for, slow adaptation speed, orhigh implementation and maintenance costs.

To create a true domain-independent solution, we introduce the TopicIndependence Scoring Algorithm (TISA) and demonstrate how to builda domain-independent bag-of-words model for sentiment analysis. Thismodel is the best preforming sentiment model published on the popular25 category Amazon product reviews dataset. The model is on aver-age 89.6% accurate as measured on 20 held-out test topic areas. Thiscompares very favorably with the 82.28% average accuracy of the 20baseline in-domain models. Moreover, the TISA model is highly uni-formly accurate, with a variance of 5 percentage points, which providesstrong assurance that the model will be just as accurate on new topic ar-eas. Consequently, TISAs models are truly domain independent. In otherwords, they require no changes or human intervention to accurately clas-sify documents in never before seen topic areas.

1 Introduction

Text analysis techniques, such as sentiment analysis, are valuable tools for busi-ness intelligence, predicting market trends, and targeting advertisements. Thistechnology is especially salient because written works include tweets, Facebookposts, blog posts, news articles, forum comments, or any other sample of elec-tronic text that has become prevalent due to the grow of the web.

Textual analysis applications need to operate over a wide and unpredictablearray of topic areas, often in real-time. However, current approaches are unableto reliably and accurately operate in real-time for new domains.

Text analysis on a wide array of topic areas is difficult because word meaningis context sensitive. Word sense disambiguation issues are one reason why clas-sifiers trained for one topic area do poorly in other topic areas. The linguisticcommunity has spent a great deal of effort trying to understand the differences

II

between word senses by build linguistic resources such as WordNet [5], WordNetAffect [12]and Senti-WordNet [1]. Word sense disambiguation is still challenging.

Fortunately, word sense disambiguation issues can be side stepped for specificproblems. Consider sentiment polarity classification, which is the binary classi-fication task where either the author approves of, or the author disapproves ofthe specific topic of interest. For sentiment polarity classification knowing wordmeaning is irrelevant, but knowing word connotation is crucial. In the followingexample, “I proudly wore my new shirt to the bank.” It is irrelevant whetherthe bank is a financial institution or a river bank because both senses of theword bank have no sentimental connotation for apparel. Thus, the word sensedisambiguation problem can be simplified into a word connotation calculation.By extension to text classification: knowing the word’s sense is irrelevant, butknowing it’s class bias for a topic area is sufficient.

We introduce a method to determine topic independent class bias scores forwords. These words can be used to build bag-of-words models that operate wellin a wide area of diverse topics. Creating topic independent word scores is simplewhen there exists labeled data from multiple domains. Bias scores for a word canbe calculated in each topic area using your machine learning algorithm of choice.A function can then be applied to these scores to determine a topic independentclass bias score for the word. Intuitively, to measure topic independence, it makessense to observe the variance of a word’s class bias in multiple topic areas. Weintroduce our Topic Independence Scoring Algorithm as a method to calculatetopic independent class bias scores from a set of existing topic area specific classbias scores.

Since our Topic Independence Scoring Algorithm uses only bias scores pro-duced by another supporting machine learning algorithm, it has several usefulproperties. First, the supporting machine learning can be swapped out. Machinelearning experts can use our algorithm with the most appropriate algorithm forthe task at hand. Second, our algorithm works on models not training data.This is very valuable in industrial settings when the training data may be lostor inaccessible due to business reasons. Alternatively, this is useful when the ex-pertise to tune the original algorithm may no longer be available, but the modelstill remains. Finally, the topic independence scoring algorithm can be evaluatedagainst the algorithm that produced the topic area specific scores. This allowsus to more effectively evaluate the value of topic independence scoring.

As a use case and for evaluation purposes we build a topic independent modelfor sentiment analysis that is highly accurate across 20 never before seen testtopic areas. Our topic independent model is even more accurate than the sup-porting machine learning algorithm in the test domains using 10 fold CV. Usingour algorithm, we built a domain-independent sentiment model from five prod-uct review categories in the Amazon product reviews dataset [2] and evaluatedit upon 20 additional product categories. Our classifier significantly outperformsthe classifiers built specifically for each of the 20 product review categories. Thebaseline classifiers built specifically for the 20 test domains were almost twice aslikely to make an error as our domain independent model.

III

Fig. 1. Distribution of topic independence by positive vs. negative bias across 25 topicareas.

2 Understanding Topic Independence

Our approach introduces the ground breaking concept of term level topic inde-pendence, which is the degree to which a terms orientation to a class remains thesame when measured across multiple topics [7]. Word sense disambiguation isone reason why classifiers trained in a single domain do poorly in other domains.The linguistic community has spent a great deal of effort trying to understandthe differences between word senses by build linguistic resources such as Word-Net [5], WordNet Affect [12]and Senti-WordNet [1]. Topic independence replacesa challenging word sense disambiguation problem with a clearly defined math-ematical counting problem in which we need to count the different orientationsa term has across multiple topic areas. This concept enables simple and fastcomputation for topic independent text analysis, and is therefore a very usefuland important new concept.

We shall further explain topic independence using sentiment analysis as anexample. A term can have either a positive, a negative, or a neutral connotationwhen it is used in context. Framed as a binary classification problem, the presenceof any term is either an indicator of positive sentiment, an indicator of negativesentiment, or it has no class bias. This bias can be determined in context bydetermining if documents in that context (aka domain or topic area) are morelikely to be positive or negative when that term is present. Given a set of differentcontexts we can count the number of contexts where the term is positive and

IV

the number of contexts where the term is negative. In Figure 1 we chart thesevalues along the x and y axis for every term in the popular 25 category AmazonProduct Reviews Dataset [2]. This chart shows why our Topic IndependenceScoring Algorithm is so important.

Sentimental topic independence is a matter of degree: there is almost alwayssome situation where a normally positive word or phrase will have a negativeconnotation. The topic independent sentiment bias of a term should be based notonly upon its sentimental strength in most situations, it must also be weightedby it‘s reliability and uniformity. Put another way, the exceptions are so frequentthat they must be accounted for in the general rule.

Figure 1 shows that there are only 11 terms that have a positive sentimentalorientation in all 25 product review categories, while 16 terms have a nega-tive sentiment orientation. For example, the 11 most topic independent positiveterms: “excellent”, “highly recommend” ”, “the best”, “best”, “an excellent”,“I love”, “love”, “wonderful”, “a great”, “always”, and “recommend” occur at.For example, the 11 most domain independent positive terms: excellent, highlyrecommend , the best, best, an excellent, I love, love, wonderful, a great, always,and recommend. The most topic independent negative terms include: “don’twaste”, “ your money”, “waste”, “waste your”, “would not”, “money”, “disap-pointed”, “worse”. These terms are very revealing, but are not enough to covera representative sample of any given text document.

The vast majority of all terms, over two million, are unique to exactly oneproduct category in our dataset. From that peak, the total volume of termsfalls off very rapidly according to the degree of topic independence. This impliesthat we need to properly scale the sentiment strength scores for terms withtheir degree of sentimental topic independence in-order to use the less topicindependent terms without overpowering the more topic independent terms.

3 Approach

The unique idea of our approach is to build a topic independent model by scor-ing terms based upon how much their class bias shifts as observed across manytopics. By doing this irrespective of the target topic area where the model willbe applied we can be reasonably confident that the model will work well for anytopic area. This contrasts quite sharply with domain adaptation methods thatseek to adapt a model build for one domain into a model that will better fitanother specific domain. This kind of custom fitting makes it increasingly likelythat such a model will need to be adapted again for the next domain that mustbe operated on. Furthermore, this kind of custom fitting to a single dataset ismore likely to overfit artifacts in those datasets than a model that must fit mul-tiple different domains since artifacts can be cross-checked with other domains.Domain independent models are much more useful than single domain modelsbecause they are more broadly applicable and less susceptible to artifacts andother noise.

V

Training a classifier with out-of-domain data can be accurately preformed ifyou can answer two key questions:

1. For any term, what is that term’s class bias in the source domain(s)?2. From this bias what can be concluded about its bias in the target domain?

The first question is fairly straight forward and easy to answer using stan-dard techniques for supervised machine learning. Delta TFIDF [8] weights worksparticularly well for this task [9], but they can easily be replaced as the state-of-the-art advances.

Answering the second question is difficult for current domain adaptationapproaches because they model the situation as a relationship between a pre-determined training topic area and the target topic area. This setup assumes adifferent relationship between each pair of topics.

Question two is very difficult to answer with that assumption, so let us in-stead assume that the class bias of a term is equally likely to shift between anyrandomly selected pair of topics. This implies that we can predict how likely anygiven term’s class bias is to shift when applied to another arbitrary topic areasimply by observing how frequently it actually shifts class bias between multipletopic areas. Similarly, we can observe the magnitude of these class bias shifts todetermine the likely magnitude of the class bias for other arbitrary topic areas.

Term level topic independence class bias scores need to measure and unifythe following semantics:

– Sensible orientation : A term’s class orientation should agree with its overallorientation in the set of topic areas.

– Strength : Terms with higher scores in the topic areas should have higherscores.

– Broad applicability : Terms that are used in more topic areas should havehigher scores.

– Uniform meaning : Terms with more uniform topic area scores should havehigher scores.

To measure these semantics we introduce our Topic Independence Scoringfunction in Equation 1. Strength can be measured with a simple average. Thisaverage should also give us a sensible orientation. A strongly oriented term ismore valuable as it becomes more broadly applicable so multiplying the strengthscore and the applicability score makes sense. The uniform meaning metric isdifficult. Variance is not a good choice since variance scores increase with dis-uniformity, have an undefined range, and when all the values are multiplied bya constant the variance goes up by the square of the constant. Attempting toaddress the dis-uniformity problem by dividing the other scores by the variance isnot a good solution because this can cause divide by zero problems and becauseof the rate at which the variance score changes. A good way to score uniformityis to use the geometric mean of the topic area scores. The geometric mean isa good choice because it has a predefined range with a maximum equal to thearithmetic mean when the values are totally uniform and with scores dropping

VI

as uniformity decreases.This final uniformity term should be multiplied with theearlier calculations because, a strong broadly applicable term is more valuablewhen the strong scores are more uniform.

Given that:

Dt is the number of topics that term t occurs in.

Sd,t is the class bias score for term t in topic area d.

TIS(t) is the feature value for term t.

We calculate Topic Independence Scores with the following formula:

TIS(t) =

Dt∑d

Sd,t

(Dt∏d

|Sd,t|

)1/Dt

(1)

The TISA function can be used to create a topic independent model froma set of existing topic dependent models. Algorithms, such as SVMs and Lo-gistic Regression use weight vectors to produce judgments. With a set of topicarea specific models built by such algorithms, the TISA function can produce atopic independent weight that can be stored in a new weight vector. This newweight vector can be used to do topic independent classification using the sameclassification algorithm that produced the original topic area specific weight vec-tors. Topic independent classification with TISA can be easily used with a widevariety of popular machine learning algorithms: There is a very low adoptionbarrier.

4 Evaluation

In this evaluation we demonstrate how to build topic independent sentimentmodels using our topic independence scoring algorithm. We demonstrate that:

1. Topic independent sentiment models outperform in-topic models.

2. Topic independent models use additional out-of-topic training data moreeffectively than alternative techniques including:

(a) Weighted voting with multiple models.

(b) Building a single model on the union of multiple topic area datasets.

3. Topic independent sentiment models can be used to find revealing and in-formative topic specific vocabulary.

Our topic independent sentiment model is 89.6% accurate when measuredover 20 additional held-out test topic areas with a low variance of 5.05 percentagepoints. Our approach is the most accurate approach published on this dataset.

VII

4.1 Test 1: TISA vs. In-topic Models

This test evaluates our topic independence scoring algorithm as a method fordomain independent sentiment classification using 20 different held-out test topicareas.

For our baseline we used the standard 10-fold cross-validation methodologyin each of the 20 test topic areas. For this baseline we choose to use the Delta IDF[7] classification algorithm, which is a slight modification on the Delta TFIDFdocument feature weighting algorithm [8]. To train a Delta IDF model calculateeach feature in the bag-of-words as shown below and add them to a weightvector. Given that:

|Pt| is the number of positively labeled training documents with term t.|P | is the number of positively labeled training documents.|Nt| is the number of negatively labeled training documents with term t.|N | is the number of negatively labeled training documents.Vt is the feature value for term t.

Vt = log2

((|N | + 1) (|Pt| + 1)

(|Nt| + 1) (|P | + 1)

)(2)

We need to balance the positive vs. the negative bias because we know thatthe datasets have been class balanced by the original author of the dataset. Fol-low the procedure described below.

Bias Balancing Procedure:

1. Create a copy of the weight vector and call it the positive vector. Call theoriginal vector the negative vector.

2. For every feature in the positive vector, if the feature value is less than zeroset the value to zero.

3. For every feature in the negative vector, if the feature value is greater thanzero set the value to zero.

4. L2 normalize the positive vector5. L2 normalize the negative vector6. Add the positive and negative vectors together and return the answer.

For our classification function we use the dot product of the document withthe weight vector. Data points with a dot product greater than or equal to zeroare positive, otherwise the point is negative.

To keep our comparison uniform and meaningful we apply the same biasbalancing procedure and use the same classification function for both TISAand Delta IDF. Using the bias balancing procedure on TISA vectors is a goodidea because the overall class balance is topic area dependent. For example,while most people love their digital cameras they absolutely hate their anti-virussoftware.

To further eliminate external factors we use the Delta IDF algorithm toproduce the set of domain specific feature scores used by TISA in equation 1.

VIII

One reason that Delta IDF was selected is because it does not have any tunableparameters. This eliminates concerns that the five Delta IDF models used tocreate the TISA model were better tuned by the researcher than the baselinemodels. These choices remove confounding factors that could be responsible fordifferences between the in-domain models and the topic independent model.

We used TISA to built our topic independent model from a set of five topicdependent models. We chose to use the books, dvds, electronics, kitchen appli-ances, and music topic area models because they are the most popular domains.Each of these topic dependent models were trained using Delta IDF with all thelabeled data for the given topic area. The other 20 topic areas were held outas test topic areas. This matches real world situations where there exists morelabeled data for popular topic areas and far less labeled data for other areas.

We built our topic independent model from a set of five topic dependentmodels as described in our approach. The five source models were built usingDelta IDF on a different set of topic areas than the 20 test topic areas. The fivesource models were built on the books, dvds, electronics, kitchen appliances, andmusic topic areas because these are the most popular domains. This matches realworld situations where there exists more labeled data for popular topic areas andfar less labeled data for other areas.

Target In-Dom TISACategory Model ModelApparel 89.17 89.90Automotive 80.92 85.99Baby 89.41 90.32Beauty 85.38 90.62Camera 86.54 91.56Cell Phone 83.66 83.82Comp Games 72.77 88.04Food 76.41 88.86Grocery 84.25 89.14Health 87.36 89.31Instruments 84.28 90.32Jewelry 85.32 89.44Magazines 85.40 89.68Office 76.32 89.91Outdoor 84.13 92.41Software 79.44 87.43Sports 87.09 90.24Tools 56.67 94.74Toys 86.87 90.40Video 84.19 89.46

Average 82.28 89.60Variance 55.70 5.05

Table 1. A general model built form using TISA to combine Delta IDF scores on dataabout books, DVDs, electronics, kitchen appliances, and music does very well on 20different product categories when compared to in-domain models built using Delta IDFon each of the categories.

IX

On average our topic area independent model is 89.60% accurate, which isa statistically significant improvement over the 82.28% accurate product areaspecific Delta IDF baselines to the 99.9% confidence interval. Table 1 shows theaccuracy of our TISA model compared to the baseline for each of our 20 testproduct review categories. Please note that the low accuracy of the tools baselineis not a mistake. We will discuss it in greater detail in the next section.

Unlike other algorithms, TISA is highly accurate on every topic area withvery low variance. Even though many of the topic areas are substantially differentfrom TISA’s training data our TISA model is more accurate and nearly 11 timesmore stable in terms of variance than the domain specific models. While domainadaptation algorithms try to exploit the relationship between topic areas, TISAattempts to minimize the effects of these relationships. This decouples TISA’straining topic areas with its testing topic areas. This has the added benefit ofallowing researchers working with TISA to use labeled data from any topic area.This can allow researchers to avoid using low quality topic area datasets, suchas topic areas with very little data, harder data points to classify, or low inter-annotator label agreement.

Table 2 illustrates the difference between topic independent term scores pro-duced by TISA and topic dependent scores that were used as input to TISA.This table shows the top 50 most negative and most positive words or pairs ofwords for TISA and the baseline models. The terms highlighted in the figureshow that TISA’s most important terms are very general purpose, while theterms in the input books model are very specific to the books topic area. Theseexample terms support our argument that TISA favors topic independent biasterms.

The product specific baselines built using Delta IDF make for an excellentcomparison. These product specific baselines are not straw men; they have beenshown to outperform Support Vector Machines on this dataset [7]. By correctlysetting up our experiment we have eliminated confounding factors and can con-clude that the quality of the models is responsible for the difference between thetwo algorithms. By evaluating TISA against the Delta IDF algorithm used tocreate its constituent sub-models we negate any potential objections that ourimprovement was due to the difference between the baseline algorithm and thealgorithm used to create the sub-models. Thus the difference between the twomodels comes from either the intelligent combination of models using TISA, orthe amount and quality of the training data. Both of these are good points forTISA, since TISA allows the researcher to freely select dataset without respectto the topic area that the model will be used on.

4.2 Test 2: TISA vs. Ensemble Methods

Skeptical readers might object to comparing TISA against in-domain Delta IDFbecause TISA is using more total labeled data. In machine learning, it is wellknown that using more training data will improve accuracy, but it is also wellknown that using training data that is not similar to the test data will hurt

X

TISA Identifies General Terms by Decreasing the Score of Topic Specific TermsPositive Terms Positive Terms Negative Terms Negative TermsBooks Domain TISA Books Domain TISAmust for highly recommended waste your waste yourmagnificent only complaint not worth very disappointedonly complaint must for two stars two starsworth every worth every very disappointing a refundexcellent read great addition worst book refunda must delighted uninteresting don’t wastewonderful book a must very disappointed waste ofdelighted great buy sorry but not recommenddefinitely worth every penny don’t waste your moneygreat resource excellent for a joke not worthexcellent overview an outstanding waste of zero starsexcellent reference must-have not waste very disappointinga delight well worth very poorly complete wasteessential reading another great is poorly save yourmust-have for definitely worth save your very poorlymy clients a must-have a disappointment avoid thisweaves and allows poor quality not wastegreat addition my only no new wastedetailed account great way refund a wastea magnificent highly recommend a poorly buyer bewaregreat fun are amazing excuse for total wasteoffers an is superb wasted my wasted mypleasantly surprised excellent condition complete waste big disappointmentevery penny exceeded my skip this a disappointmentgreat introduction superb big disappointment of junkpleasure to pleasantly surprised zero stars hard earnedand accessible is awesome terrible book really disappointedbe required great condition worst books a jokereally helped great product poorly organized don’t buybe missed not disappoint good reviews stinksnot disappoint i highly your money money backtop notch excellent choice disappointing a poorlyterrific book best ever boring book poor qualitybeautifully written excellent i a refund returned thisexcellent resource loves this unfortunately this insult totranscends outstanding poorly written or moneyrenewed delighted with factual errors extremely disappointedgreat collection recomend it glowing reviews is terriblefabulous book gem new here disappointmentmust-have loves it disappointment i not buyfirst rate very pleased total waste not recommendedan outstanding definitely recommend am disappointed stay awayrefreshing and no nonsense was boring don’t botheryou wanting also great irritated worthlessa pleasure can’t beat even finish i regretdeveloping a the raw disappointing i huge disappointmentteaches us great look had hoped never buyfrom home thumbs up disappointment dudpoems and she loves drivel disappointingvery comprehensive love this a waste the trash

Table 2. Top 50 most positive and negative terms for the Books domain as determinedby in-domain Delta IDF vs. Top Most positive and negative terms as determined byTISA using Books, DVDs, and Electronics. All terms shown have the correct senti-mental orientation and are strongly oriented. However, in-domain Delta IDF identifiesmany features, shown in bold and highlighted in red , that will not generalize well tonon-book data. Instead, TISA placed more importance on terms, shown in italics andhighlighted in green , that should generalize very well to other domains.

XI

accuracy. One of TISA’s main benefits is that it allows machine learning prac-titioners to leverage large amounts of dis-similar data by reducing the impactof the dis-similarity. The tools entry in Table 1 is a clear example of why thisapproach is so important: using more training data is the entire point of domainadaptation.

One reason why TISA is very accurate is that it preserves and intelligentlyuses the information captured by splitting the document pool into different do-mains or topic areas. Consider building a Delta IDF dot product classifier usingthe union of all the data that the topic independent model was trained on. Ta-ble 3 shows how the TISA model is more accurate than a Delta IDF classifiercreated from the union of the same set of documents at an accuracy of 89.6% to86.3%. This difference is significant to the 99.5% confidence level.

Target Dom In-Dom TISA Union WeightedCategory Size Model Model Model VotingTools 19 56.67 94.74 73.68 84.21Instruments 93 84.28 90.32 88.17 87.10Office 109 76.32 89.91 88.07 87.16Automotive 314 80.92 85.99 81.85 80.57Food 377 76.41 88.86 86.21 87.27Computer Games 485 72.77 88.04 85.98 84.33Outdoor 593 84.13 92.41 90.22 89.71Jewelry 606 85.32 89.44 88.45 88.61Grocery 654 84.25 89.14 88.23 88.69Cell Phone 692 83.66 83.82 78.90 79.05Beauty 821 85.38 90.62 87.33 88.67Magazines 1124 85.40 89.70 86.39 87.82Software 1551 79.44 87.43 84.53 83.75Camera 1718 86.54 91.56 88.07 88.53Baby 1756 89.41 90.32 89.07 89.46Sports 2029 87.09 90.24 87.83 88.37Apparel 2603 89.16 89.90 88.21 89.44Health 2713 87.36 89.31 85.51 86.10Video 4726 84.19 89.46 90.12 88.30Toys 4929 86.87 90.40 89.06 89.53

Average 2317 82.28 89.60 86.30 86.83

Table 3. General TISA “BDEKM” model built from the Books, DVDs, Electronics,Kitchen Appliances, and Music Delta IDF models vs. Weighted Voting with thesemodels vs. a single Delta IDF model built on the union of all the Books, DVDs,Electronics, Kitchen Appliances, and Music data. Results have been sorted by size.The 10-fold in-domain accuracies for each test domain are displayed for reference.

A popular alternative technique to leverage more out of domain data is touse multiple classifiers under a weighted voting approach. Delta IDF dot productclassification is particularly well suited to this approach because, when both thedocuments and the weight vectors are normalized to unit length, the magnitudeof the dot product can serve as the vote’s weight. Weighted voting using thebooks, DVDs, electronics, kitchen appliances, and music domains over the testdomains is 86.83% accurate. The difference between weighted voting and theTISA method using the same training and test points is significant to the 99.9%confidence level. This weighted voting approach is statistically no different than

XII

the union model as indicated by a p-value of .3555 . These results are displayedin detail in Table 3.

4.3 Sentiment Feature Mining

In many case it is valuable to know what the important domain specific biasfeatures are. For example, someone who is shopping for clothes may want toknow why a specific article of clothing was rated poorly by users. While reportingto the shopper the highest scoring topic independent features for the productwill clearly show that people did not like the product, it will not do a good job ofshowing why people did not like the article of clothing because topic independentfeatures are very generic. To solve this sentiment mining problem we must reportto the shopper the topic specific reasons why people did not like the article ofclothing.

Fortunately, the topic-independent model can be used to automatically gen-erate topic-specific sentiment models. These topic specific models can then beused to report specific reasons why people liked or disliked the topic.

Positive Terms Negative Termscompliments on toe is returned them poor customergreat quality hubby holes received ais comfortable thick as defective creditedare soft so soft cheaply made disappointed whengreat item wanted a the return recieved theconfortable tons of policy postgreat shoes best bra make sure return shippingones and locally charged cancelledand comfortable monday the photo i emailedwith jeans great to remove never ordergreat bag really great sent the earsfit very definitely buy so thin wontthem very best shoes send the item backkhaki are exactly the ankle top andwere exactly sleek off my torecomfortable they walking shoe ordered <num> too wideis slightly good shoe known i seehe really ride up times and the seamlove em last forever holes in just aboutfeels great things and shrunk pay toreasonable price under jeans so tight pants weremany different very confortable <num> sizes thin thatbra ever as thick big and openedcomfortable from wanted something thin and ordered aeven in tons torn uncomfortable the

Table 4. Top 50 most positive and negative terms mined for the apparel topic areausing the topic independent model built by TISA on books, DVDs, electronics, kitchenappliances, and music data. The terms are strongly sentimental and are correctly ori-ented for apparel. The terms tend to be very specific to the apparel topic area.

This takes 3 steps: (1) gather a set of documents about the topic the useris interested in, (2) classify every document using the topic-independent modeland label them as positive or negative with the classifiers decision, (3) compute∆IDF(t) scores for terms in the set of documents that were mechanically labeledin the previous step. The top most features of this model are the strongest reasons

XIII

why people liked or disliked the product. Table 4 shows the top 50 strongestsentimental terms for the clothing topic computed using this method.

5 Related Work

Supervised machine learning is a common approach for sentiment analysis. Nor-mally, a classifier is trained on a hand labeled dataset for the specific topic areaof interest. Training these classifiers generally takes a long time, but once theyare trained they can rapidly make accurate judgments of the type they weretrained to make, on the type of things they were exposed to during the trainingprocess. Using Support Vector Machines [6] with a bag-of-words feature space isone of the most popular examples of this approach, including the seminal workon sentiment analysis for movies [11].

While these in-domain methods work well in a predefined topic area with asufficient amount of labeled data they do not work well when used outside of thepredefined topic area. As a result these methods do not work well for importantapplications, such as personal assistants, that need to provide answers for anydomain, or topic area, that the user is interested in at the moment.

Current domain-adaptation approaches such as CODA [4], SCL-MI [2], SFA[10], and Couple Spaces [3] build a model for a domain, which has no labeleddata, using labeled data from a different domain. This is unacceptable becauseit is infeasible to train a new model in real-time whenever an electronic personalassistant encounters a question about a new domain.

To address these challenges and enable personal assistants to succeed in un-expected topic areas we took a strikingly different approach to re-score sentimentfeatures using their domain-independence. Our work alone has been designed tobuild models that remain highly accurate even when they are used on unfamiliartopics that may be vastly different.

In a business setting it is highly desirable to be able to deploy trained modelson new topic areas that they were not designed for. Training these models shouldnot require any special changes for the topic area. Furthermore, these modelsshould be highly accurate in every topic area that they will be used upon even ifthe list of topic areas they will be used upon is unknown. Unlike state-of-the-artDomain Adaptation approaches, our TISA fulfills these demands as summarizedin Table 5.

Our approach is highly accurate across 20 never before seen test domains.Surprisingly, our algorithm is even more accurate than models that were customtailored to the test domains.

XIV

Comparison Criteria TISA In-Dom SCL-MI SFA-DI CODA with CODA with∆IDF 0 Target 1600 Target

Situations Modeled 20? 20?? 12? ? ? 12? ? ? 12? ? ? 12? ? ?

Requires Labeled Datafrom Other Domains

Yes No Yes Yes Yes Yes

Requires In-domainLabeled Data

No Yes No No No Yes

Requires UnlabeledIn-domain Data

No No Yes Yes Yes Yes

Average Accuracy 89.6 82.28 77.97 78.66 83.23 86.46Variance 5.05 55.70 25.38 17.29 11.54 2.89

Table 5. TISA has the easiest to satisfy training data requirements, is simple, fast,highly accurate, and reliable. Caution should be taken when directly comparing theaverage accuracy and variance numbers of TISA and our ∆IDF baseline to other pub-lished approaches due to the different training environments described.

6 Conclusion

In this paper we showed that topic-independent sentiment analysis is highlyimportant for a wide array of applications. We pointed out how state-of-the-artdomain-adaptation approaches do not address these problems. To address theseproblems, we designed our approach with the core goal of accurate sentimentclassification for unforeseen topic areas.

Our algorithm has several advantages over other approaches because it doesnot require any information about the topic area, including labeled or unlabeleddata from the topic area. First, machine learning experts can use our scoringalgorithm with the most appropriate algorithm for the task at hand. Second,even if the training data has been lost, is inaccessible due to business reasons,or the expertise to tune the original algorithm is no longer available, existingmodels can still be used with TISA to produce topic independent models. Third,training time is substantially reduced for super-linear training algorithms bycutting the number of documents down into multiple smaller pools. Fourth, TISAcan leverage existing labeled data in any number of topic areas. We speculatethat this reduces overfitting and leads to our demonstrated better results.

TISA is the only true scalable topic-independent sentiment analysis solutionfor real world problems. A single topic-independent model built using TISA

? Each modeled situation corresponds to a product review category since each is aheld-out test set.

?? Each product review category is a topic area and is treated as a test situation.Although 10-fold cross-validation is used in each product review category folds arenot counted as a test situation. Average and variance scores are computed over testsituations. Please note that the average and variance reported in this table for ∆IDF includes domains that TISA was trained on.

? ? ? Each unique source/target product review category pair is being treated as a mod-eled situation. Every domain adaptation source/target pair for the Books, DVDs,Electronics, and Kitchen product review categories were modeled.

XV

is vastly preferable to using multiple models built either in-domain or builtusing domain adaptation for the following reasons: One, a single model is mucheasier and less costly to create and maintain. Two, topic-independent modelscreated using TISA are even more accurate than topic-specific models due totheir ability to leverage more data and reduce the affects of noisy features. Three,our topic-independent models are 11 times more reliable than domain specificmodels. Considering the fact that TISA models require no changes to work wellon a new topic area, we can confidently state that TISA is the best choice forsentiment analysis, especially for real world applications. In the future we plan toenhance TISA for vastly different writing styles and by incorporating sentimentpersonalization into our algorithm.

References

1. S. Baccianella, A. Esuli, and F. Sebastiani. Sentiwordnet 3.0: An enhanced lexicalresource for sentiment analysis and opinion mining. In Proceedings of the Sev-enth conference on International Language Resources and Evaluation (LREC10),Valletta, Malta, May. European Language Resources Association (ELRA), 2010.

2. J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes andblenders: Domain adaptation for sentiment classification. In Annual Meeting-Association For Computational Linguistics, volume 45, pages 440–447, 2007.

3. J. Blitzer, S. Kakade, and D. P. Foster. Domain adaptation with coupled subspaces.Journal of Machine Learning Research - Proceedings Track, 15:173–181, 2011.

4. M. Chen, K. Q. Weinberger, and J. Blitzer. Co-training for domain adaptation. InNIPS, pages 2456–2464, 2011.

5. C. Fellbaum. Wordnet. Theory and Applications of Ontology: Computer Applica-tions, pages 231–243, 2010.

6. T. Joachims. Text categorization with support vector machines: Learning withmany relevant features. Machine Learning: ECML-98, pages 137–142, 1998.

7. J. Martineau. Identifying and Isolating Text Classification Signals from Domainand Genre Noise for Sentiment Analysis. PhD thesis, University of Maryland,Baltimore County, Computer Science and Electrical Engineering, December 2011.

8. J. Martineau, T. Finin, A. Joshi, and S. Patel. Improving binary classificationon text problems using differential word features. In Proceeding of the 18th ACMconference on Information and knowledge management, pages 2019–2024. ACM,2009.

9. G. Paltoglou and M. Thelwall. A study of Information Retrieval weighting schemesfor sentiment analysis. In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics, pages 1386–1395. Association for ComputationalLinguistics, 2010.

10. S. Pan, X. Ni, J. Sun, Q. Yang, and Z. Chen. Cross-domain sentiment classificationvia spectral feature alignment. In Proceedings of the 19th international conferenceon World wide web, pages 751–760. ACM, 2010.

11. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification usingmachine learning techniques. In Proceedings of the ACL-02 conference on Empiricalmethods in natural language processing-Volume 10, pages 79–86. Association forComputational Linguistics Morristown, NJ, USA, 2002.

12. C. Strapparava and A. Valitutti. Wordnet-affect: an affective extension of wordnet.In Proceedings of LREC, volume 4, pages 1083–1086, 2004.


Recommended