Stylistics in Customer Reviews of Cultural Objects
Xiao Hu, J. Stephen Downie
The International Music Information Retrieval Systems Evaluation Lab (IMIRSEL)
University of Illinois at Urbana-Champaign
THE ANDREW W. MELLON FOUNDATION
Agenda
MotivationCustomer reviews in epinions.comExperiments
Genre classificationRating classificationUsage classificationFeature studies
Conclusions & Future Work
MotivationOnline customer reviews on culture objects:
User-generated user-centered retrieval
Detailed descriptions contextual info.
Large amount rich resourceSelf-organized ground truth
Text mining: Mature techniques and Handy tools
Review mining: a place to play Stylistics Text Analysis!
Motivation
ClassifyReviews
Identify User Descriptions
Connect toObjects
CustomerReviews
Epinions.com
Amazon.com
…..
Class 1
Class 2
Description 1Description 1Description 1
Description 1Description 1Description 1
D1 D2 D3
Prominent Features
Genres
Ratings
Usages
D1 D2 D3User-centered access points
Customer Reviews
Published on www.epinions.com Focused on the book, movie and music Each review associated with:
a genre label a numerical quality rating a recommended usage (for music
reviews)
numerical rating associated
full text, to be analyzed
recommended usage
Genre Taxonomy (music)
Jazz, Rock, Country, Classical, Blues, Gospel, Punk, .…
Renaissance, Medieval, Baroque, Romantic, …
28 Major Genre Categories
Experiments
to build and evaluate a prototype system that could automatically : predict the genre of the work being reviewed predict the quality rating assigned to the
reviewed item predict the usage recommended by the reviewer discover distinctive features contributing to each
of the above
Models and Methods
Prediction problem: Naïve Bayesian (NB) Classifier
Computationally efficient Empirically effective
Hierarchical clustering (for usage prediction only)
Feature analysis: Frequent pattern miningNaïve Bayesian feature ranking
Data Preprocessing
HTML tags were stripped out; Stop words were NOT stripped out;Punctuation was NOT stripped out;
They may contain stylistic informationTokens were stemmed
Genre Classifications
Data set
Reviews on
Book Movie Music
#. Of reviews
1800 1650 1800
#. Of genres
9 11 12
Mean of review length
1,095 words 1,514 words 1,547 words
Std. Dev. of review length
446 words 672 words 784 words
Term list size
41,060 47,015 47,864
Genres Examined
Book Movie MusicAction / Thriller Action /Adventure Blues
Juvenile Fiction Children Classical
Humor Comedies Country
Horror Horror/Suspense Electronic
Music & Performing Arts
Musical & Performing Arts
Gospel
Science Fiction & Fantasy
Science-Fiction / Fantasy
Hardcore/Punk
Biography & Autobiography
Documentary Heavy Metal
Mystery & Crime Dramas International
Romance Education/General Interest
Jazz Instrument
Japanimation (Anime) Pop Vocal
War R&B
Rock & Pop
Genre Classification Results
Reviews on Book Movie MusicNumber of genres 9 11 12Reviews in each genre 200 150 150Term list size (terms) 41,060 47,015 47,864Mean of review length (words)
1,095 1,514 1,547
Std Dev of review length (words)
446 672 784
Mean of precision 72.18%
67.70%
78.89%
Std Dev of precision 1.89% 3.51% 4.11%5 fold random cross validation for book and movie reviews3 fold random cross validation for music reviews
Confusion : Book Reviews
Classified As
Action
Bio. Hor.
Hum.
Juv. Mus.
Mys.
Rom.
Sci.
Action 0.61 0.01 0.06 0.01 0.02 0.03 0.20 0.05 0.02
Bio. 0.04 0.70
0.01 0.05 0.03 0.13 0.01 0.03 0
Horror 0.09 0 0.66
0 0.05 0 0.12 0.02 0.06
Humor 0.01 0.10 0 0.74 0.03 0.08 0.01 0.01 0.03
Juvenile 0.01 0.01 0 0.07 0.86
0.02 0 0.02 0
Music 0 0.09 0 0 0.01 0.89
0 0 0.01
Mystery 0.20 0 0.01 0 0.01 0 0.70
0.05 0.04
Romance 0.06 0.01 0.01 0 0.04 0 0.08 0.78 0.03
Science 0.03 0 0.02 0.01 0.11 0.03 0.01 0.13 0.66
Confusion : Movie ReviewsClassified As
Act. Ani. Chi. Com.
Doc.
Dra.
Edu.
Hor.
Mus.
Sci. War
Action 0.77
0 0 0.01 0 0.01 0.02 0 0 0.10 0.09
Anime 0 0.89
0.03 0.03 0 0 0 0 0 0.05 0
Children
0.02 0.01 0.95
0 0.01 0.01 0.01 0 0 0 0
Comedy
0.09 0.01 0.06 0.52 0.03 0.17 0.06 0.01 0.03 0.01 0.02
Docu. 0.02 0 0 0.04 0.63
0.01 0.19 0 0.09 0 0.02
Drama 0.16 0 0 0.12 0.10 0.45
0.05 0.03 0.03 0.01 0.04
Edu. 0 0 0.02 0.02 0.31 0.03 0.57
0 0 0.01 0.03
Horror 0.15 0.02 0.02 0.02 0.03 0.02 0.05 0.69
0 0.10 0.02
Music 0 0 0 0.01 0.18 0 0 0 0.81
0 0
Science 0.04 0.01 0.02 0 0.06 0.01 0.02 0.03 0 0.76 0.05War 0.11 0 0.01 0.01 0.08 0.08 0.05 0.03 0.02 0.02 0.59
Confusion : Music ReviewsClassified
AsBlu. Cla. Cou Ele. Gos. Pun. Met. Int’l Jazz Pop. RB Roc.
Blues 0.61 0 0.10 0 0 0 0 0 0 0 0 0.29
Classical
0 0.94 0 0.03 0 0 0 0 0 0 0 0.03
Country 0 0 0.92 0 0.03 0 0 0 0 0 0 0.06
Electr. 0 0 0 0.92 0 0 0.06 0 0 0 0 0.03
Gospel 0 0 0.05 0 0.80 0 0 0 0 0 0.05 0.10
Punk 0 0 0 0.05 0 0.71 0.05 0 0 0 0 0.19
Metal 0 0 0 0 0 0 0.89 0 0 0 0 0.11
Int’l 0 0.04 0.00 0.04 0 0 0 0.81 0 0 0 0.04
Jazz 0 0 0 0.04 0 0 0 0 0.89 0.04 0 0.04
Pop Vo. 0 0 0.04 0.07 0 0 0 0.04 0.07 0.68 0 0.11
R&B 0 0 0 0 0 0 0 0 0 0.06 0.88 0.06
Rock 0.03 0 0.03 0 0 0 0.03 0 0 0.03 0 0.89
Rating Classification
Five-class classification1 star vs. 2 stars vs. 3 stars vs. 4 stars vs 5
starsBinary Group classification
1 star + 2 stars vs. 4 stars + 5 starsad extremis classification
1 star vs. 5 stars5 fold random cross validation for Book and Movie review
experiments5 fold cross validation for Music review experiments
Rating : Book Reviews
Experiments 5 classes
Binary Group
Ad extremis
Number of classes 5 2 2Reviews in each class 200 400 300Term list size (terms) 34,123 28,339 23,131Mean of review length (words)
1,240 1,228 1,079
Std Dev of review length (words)
549 557 612
Mean of precision 36.70% 80.13% 80.67%Std Dev of precision 1.15% 4.01% 2.16%
Rating : Movie Reviews
Experiments 5 classes
Binary Group
Ad extremis
Number of classes 5 2 2Reviews in each class 220 440 400Term list size (terms) 40,235 36,620 31,277Mean of review length (words)
1,640 1,645 1,409
Std Dev of review length (words)
788 770 724
Mean of precision 44.82%
82.27%
85.75%
Std Dev of precision 2.27% 2.02% 1.20%
Rating : Music Reviews
Experiments 5 classes
Binary Group
Ad extremis
Number of classes 5 2 2Reviews in each class 200 400 400Term list size (terms) 35,600 33,084 32,563Mean of review length (words)
1,875 2,032 1,842
Std Dev of review length (words)
913 912 956
Mean of precision 44.25%
79.75%
85.94%
Std Dev of precision 2.63% 3.59% 3.58%
Confusion : Book Reviews
Classified As
1 star
2 stars
3 stars
4 stars
5 stars
1 star 0.45 0.21 0.15 0.09 0.10
2 stars 0.24 0.36 0.19 0.12 0.09
3 stars 0.11 0.17 0.28 0.22 0.21
4 stars 0.05 0.06 0.17 0.41 0.31
5 stars 0.04 0.07 0.17 0.26 0.46
Confusion : Movie Reviews
Classified As
1 star
2 stars
3 stars
4 stars
5 stars
1 star 0.49 0.19 0.17 0.08 0.072 stars 0.15 0.45 0.23 0.11 0.063 stars 0.04 0.24 0.28 0.27 0.174 stars 0.05 0.13 0.13 0.41 0.275 stars 0.07 0.03 0.16 0.20 0.54
Confusion : Music Reviews
Classified As
1 star
2 stars
3 stars
4 stars
5 stars
1 star 0.61 0.24 0.07 0.05 0.022 stars 0.24 0.15 0.36 0.15 0.093 stars 0.11 0.13 0.41 0.20 0.154 stars 0.03 0.06 0.10 0.32 0.485 stars 0 0 0.09 0.11 0.80
Usage Classification
Each music review has one usage suggested by the reviewer
It can be chosen from a ready-made list of 13 usages
Chose the most popular 11 usages for experiments
Usage Categories and Counts
Usage Count
Usage Count
Driving (DRV) 1,349 Waking up (WKU) 271
Hanging With Friends (HWF)
1,215 Going to Sleep (GTS) 269
Listening (LST) 592 Cleaning the House (CTH)
230
Romancing (ROM) 492 At Work (AWK) 188
Reading or Studying (ROS)
447 With Family 35
Getting ready to go out (GRG)
378 Sleeping 15
Exercising (EXC) 291 TOTAL 5,772
Data and initial result
Experiments All classes
Number of classes 11Reviews in each class 180Term list size (terms) 36,561Mean of review length (words)
838.75
Std Dev of review length (words)
511.39
Mean of precision 19.55%
Std Dev of precision 2.89%10 fold cross validation
Confusion matrix
Classified As
AWK
CTH DRV
EXC GRG
GTS HWF
LST ROS
ROM
WKU
AWK .139 .139 .100 .067 .056 .05 .144 .078 .056 .100 .072
CTH .072 .283 .128 .033 .022 .022 .094 .050 .083 .128 .083
DRV .106 .111 .150 .078 .089 .039 .133 .050 .050 .083 .111
EXC .094 .089 .089 .111 .111 .028 .206 .056 .028 .078 .111
GRG .133 .083 .161 .083 .106 .033 .133 .067 .044 .106 .050
GTS .056 .072 .089 .056 .039 .194 .078 .111 .078 .133 .094
HWF .128 .100 .083 .039 .094 .028 .272 .050 .050 .072 .083
LIS .083 .067 .072 .039 .044 .100 .061 .189 .089 .167 .089
ROS .089 .122 .072 .022 .067 .106 .056 .100 .111 .172 .083
ROM .056 .128 .067 .028 .039 .028 .017 .044 .061 .500 .033
WKU .078 .106 .100 .067 .083 .083 .150 .072 .072 .094 .094
Usage super-classes
Frequent confusions: a measure of similarity
Hierarchical clustering based on the confusion matrix
Hierarchical clustering
Going to sleepListening
Getting ready to go out
Driving
Reading or studyingRomancing
Cleaning the houseAt work
Hanging out with friends
Waking upExercising
Relaxing
Stimulating
R1
R2
S1
S2
Classifications on usage super-classes
Experiments Relaxing,Stimulating
R1,R2,S1,S2
Number of classes 2 4Reviews in each class 900 360
Term list size (terms) 34,759 30,637
Mean of review length (words)
839.03 825.45
Std Dev of review length (words)
509.96 514.38
Mean of precision 65.72% 42.60%
Std Dev of precision 3.15% 4.60%
10 fold cross validation
Feature studies
What makes the classes distinguishable?
What are important features?How important are they?Two techniques applied
Frequent Pattern MiningNaïve Bayesian Feature Ranking
Focus on music reviews
Frequent Pattern Mining (FPM)
Originally used to discover association rulesFinds patterns consisting of items that frequently
occur together in individual transactions Items = candidate words (terms)
depending on specific questionsTransactions = review sentences
Items
Transactions
Positive and negative descriptive patternsRecall: rating classification on music reviews
Experiments 5 classes
Binary Group
Ad extremis
Number of classes 5 2 2Reviews in each class 200 400 400Term list size (terms) 35,600 33,084 32,563Mean of review length (words)
1,875 2,032 1,842
Std Dev of review length (words)
913 912 956
Mean of precision 44.25%
79.75%
85.94%
Std Dev of precision 2.63% 3.59% 3.58%
Positive and negative descriptive patterns
Mining frequent descriptive patterns in positive and negative reviews
Reviews Positive NegativeTotal Reviews 400 400
Total Sentences 63118 30053
Total Words 1027713 447603
Avg. (STD ) sentences per review 157.80 (75.49) 75.13 (41.62)
Avg. (STD) words per sentence 16.28 (14.43) 14.89 (12.24)
adjectives, adverbs and verbs, negatives no nouns, no stopwords
Single term patterns
Positive Reviews Negative Reviews
not – 3417 sentencesgood – 1621 sentences:
1/4 of all sentences
not – 1915 sentencesgood – 1025 sentences:
1/3 of all sentences
Good = Bad?!
Digging deeper ----
good in a negative context Negation: “Nothing is good.”
“It just doesn't sound good.”Song titles:
“Good Charlotte, you make me so mad.”“Feels So Good is dated and reprehensibly bad.”
Rhetoric: “And this is a good ruiner: …” “What a waste of my good two dollars…”
Faint praise: “…the only good thing… is the
packaging.” Expressions:
“You all have heard … the good old cliché.”
Double term patterns
Positive Reviews
Negative Reviews
not good not realli
realli good not listen not great
not goodnot badnot reallinot soundrealli good Good Bad?!
Digging deeper and deeper --
Triple term patterns
Positive Reviews Negative Reviews
sing open melodsing smooth melodsing fill melodsing smooth opennot realli goodsing lead melodsound realli goodsing plai melodaccompani sing melodsing soft melod
not realli goodnot realli listen bad not good bad not sound pretti tight spitbad not don’trealli not don’trealli bad notpretti bad notnot sing sound
Noun patterns in genre classification
Reviews on MusicNumber of genres 12Reviews in each genre 150Term list size (terms) 47,864Mean of review length (words)
1,547
Std Dev of review length (words)
784
Mean of precision 78.89%
Std Dev of precision 4.11%
Recall: genre classification on music reviews
Noun patterns in genre classification
Studied four popular genres Only nouns considered
Reviews Classical
Country
HeavyMetal
JazzInstr
Total Reviews 150 150 150 150Total Sentences 7886 16720 21532 12692
Total Words 138282 240595 318252 184220Avg. (STD ) sentences per review
52.57(32.68)
111.47(43.77)
143.55(71.69)
84.61(28.60)
Avg. (STD) words per sentence
17.54(12.25)
14.39(11.79)
14.78 (12.33)
14.51(10.16)
Single term patterns
Classical
Country
Heavy Metal
Jazz Instrument
musicrecordpieccdwork
song albumlovemusictime
songalbumguitarbandtrack
songalbummusicsolotime
Double term patternsClassical Country Heavy
MetalJazz Instrument
cd musicmusic piecpiec pianopiano concertoorchestra symphonimusic recordpiano opmusic workmusic timemusic composviolin concertocd pieccd record
twain shaniadixi chickstation unionguitar steeltim mcgrawcash johnnititl tracksong titlkrauss alisondrum guitarcountri radiosong beatstyle song
album songsong guitarriff guitarguitar bassdrum guitarsong lyricsong riffsong chorusolo guitarsong trackalbum trackband albumband song
music jazzliner notedrum bassjazz albumalbum songjazz songguitar basstenor saxsolo songpiano bassmile davisolo pianosection rhythm
Naïve Bayesian Feature Ranking (NBFR)
Based on NB text categorization modelPrediction in binary classification cases:
)/(
)/(log),(
)(
)(log
)/(
)/(log
1 jt
jtV
tit
j
j
ij
ij
CwP
CwPdwc
CP
CP
dCP
dCP
> 0, di is in Cj
< 0, di is not in Cj
Features in usage super-classes
Recall: classification on usage super-classes
Experiments Relaxing, Stimulating
Number of classes 2Reviews in each class 900
Term list size (terms) 34,759
Mean of review length (words)
839.03
Std Dev of review length (words)
509.96
Mean of precision 65.72%
Std Dev of precision 3.15%
Top-ranked terms in super-classes
Relaxing Stimulating
Botti (Chris)Shelby (Lynne)Bethany (Joy)Debelah (Morgan)Mckennitt (Loreena)Pontiy(Jean Luc)Shabazz (lyricist)TrunightwishTarja (Turunen)
Dio (Ronnie James)Roca (Zach De La Roca)Slade (British band)Incubus (band)Edan (rap artist)Twiztid (band)KJ (KJ52)blueSerj (Tankian)Stooges (The)
Terms in ()’s were manually added for clarity
Artist-usage relationship
Binomial exact test on artists with >10 reviews (p < 0.05)
Artist Usage p value
AFI Waking Up 0.03252Black Sabbath At Work 0.00028Celine Dion Romancing 0.02499Dream Theater
Listening 0.01862
Metallica Waking Up 0.03252Nirvana_(USA) Going to
Sleep0.01862
Implementation & T2K (demo)
Text-to-Knowledge (T2K) Toolkit
A text mining frameworkReady-to-use modules and
itinerariesNatural Language
Processing tools integrated
Supporting fast prototyping of text mining
Data Preprocessing
NB Classifier
ConclusionsText analysis of user-generated reviews
on culture objectsNB on genre, rating, and usage classificationFeature studies: FPM and NBFR
Customer reviews are good resources for connecting users’ opinions to cultural objects and thus facilitating information access via novel, user-oriented facets.
Future work
More text mining techniquesOther critical text
blogs, wikis, etcFeature studies
other kinds of features
Questions?
IMIRSEL
Thank you!
THE ANDREW W. MELLON FOUNDATION