ACMMM 2013 reading: Large-scale visual sentiment ontology and detectors using adjective noun pairs

transcript

ACMMM2013 reading @ Kanto CV 2014.2.23

Akisato Kimura (@_akisato) NTT Communication Science Labs

Paper to read

Sentiment analysis of images

Basic strategy

• Adjective noun pairs (ANPs) – Adjectives play a significant role in conveying

sentiments, but visually inconsistent. – Combined phrases make the concepts more

detectable than single adj. & n. • cf. Recognition using visual phrases

[CVPR11]

Contributions

• Automatically construct a large-scale Visual Sentiment Ontology (VSO) with 3000 ANPs – With the help of psychological theories and web

mining techniques

• Propose SentiBank: a visual concept detector library to detect the presence of 1200 ANPs – Useful for sentiment analysis of visual contents as

attributes

Framework

1. Select 24 fundamental words representing emotion 2. Retrieve images with every of the words as a query 3. Tags associated with the images are extracted to

build ANPs ( = strong sentiment ADJs + all Ns) 4. Train ANP detectors and keep only detectors with

reasonable performance to form SentiBank

Framework

24 basic words for emotions

• Founded on Plutchik’s Wheel on Emotions

http://en.wikipedia.org/wiki/Plutchik%27s_Wheel_of_Emotions#Plutchik.27s_wheel_of_emotions

1 2 3 4

1 2 3 4 1 2 3 4

1 2 3 4

24 basic words for emotions (cont.)

• 8 basic emotions x 3 degrees

1 2 3 4 1 2 3 4

Framework

Sentiment word discovery

• Web mining strategy – Retrieve images & videos from Flickr & YouTube

with each of 24 basic words as a query – Extract their associated tags by Lookapp tool

[Borth+ ICMR11]

Sentiment word discovery (cont.)

• Exploits various NLP techniques & resources – Post-processings

• Remove stop words, perform stemming • Top 100 tags are selected for each emotion

– Sentiment value computation (-1 neg +1 pos) • SentiWordNet [Esuli+ 2006] SentiStrength [Thelwall+ 2010]

Framework

ANP construction

• Take all the pairs of (ADJ, N)s into consideration – Remove named entities with meaning changed

(e.g. “hot” + “dog” generic named entity)

• Fuse sentiment values – Simple sum-up model : s(ANP) = s(ADJ) + s(N)

• If sgn(s(ADJ)) != sgn(s(N)), then s(ANP) = S(ADJ).

• Rank ANPs by their frequency – Remove all ANPs with no images – Resulting in 47K ANP candidates

ANP construction (cont.)

• Ontology sampling – Partition candidates into individual ADJ sets – Sample a subset from each ADJ set – Take ANPs with sufficient (>125) images

• Linking back to emotions – For each ANP, count images with 24 basic words & the ANP

in their meta, create a 24-dim histogram

How reliable ANP labels are?

• Web annotation may not be reliable – Using Flickr tags as pseudo ANP labels might incur

false positive

• Manual (=AMT) validation – Randomly sample images of 200 ANPs – Each image is validated by 3 Turkers, treated as

correct only if >= 2 Turkers agree – Results: 97% correct

http://visual-sentiment-ontology.appspot.com

Framework

Training ANP detectors

• Various visual features – Color histogram (3 colors x 256 dim), GIST (512 dim),

LBP (53 dim), BoW with spatial pyramid and max pooling (1000 dim x 2 layers), attributes [Yu+ CVPR13] (2000 dim)

• Training a linear SVM for every ANP – Parameter tuning by cross validation (AP@20-based) – Measure performance by AP@20, AUC & F-score.

• Several feature fusions – Early fusion, late fusion, weighted early/late fusion

Detector performance

• Comparing visual features (left) – 1st: attributes, 2nd: BoWs

• Comparing feature fusions (right) – 1st: Weighted late fusion, but not dominant – Adopt early fusion for implementation simplicity

Examples

Detectability issues

• Select only ANPs with good detection accuracy – 1200 ANPs with AP@20>0 & F-score>0.6

• No correlation bwt detectability & occurrence – Difficulty in detecting ANPs depends on the

content diversity and the abstract level

Other issues

• Special visual features improve detectors – ObjectBank [Li+ NIPS2010], facial features, aesthetic

features [Bhattacharya+ ACMMM13] • Ontology structure

– Interactive process to combine 1200 ANPs into distinct groups 6 levels, 15 nodes at the top

• N: standard “is-a” relations • ADJ: exclusive (“sad” vs “happy”) & strength (“nice”, “great”,

“awesome”) – 41% nouns uncovered by ImageNet

• Related to abstract concepts (e.g. “violence”, “religion”)

Framework

SentiBank applications

• Sentiment prediction in image tweets – Sentiment analysis rely on text-based tools – 140 characters (in ENG) are too short – Use SentiBank to complement and augment texts

• Emotion classification – Demonstrate the performance against an emotion

dataset of art photos [Machajdik+ ACMMM10]

Sentiment prediction in tweets

• Data collection – Gather tweets with images & popular hashtags

• #nuclearpower, #election, #championsleague, #cairo …

– AMT to obtain sentiment ground-truth • 3 Turkers for every tweets: almost agreed (below)

http://www.ee.columbia.edu/ln/dvmm/vso/download/twitter_dataset.html

Sentiment prediction in tweets (cont.)

• Visual-based classifier – Serve SentiBank as a mid-level representation

• Use ANP responses as an input feature • Employ a linear classifier for the final output

– Compare SentiBank with low-level features

• Text-based classifier – Naïve Bayes + SentiStrength

• Overall performance

• Detailed performance

Emotion classification

• Dataset – 807 art photos, 8 emotion categories retrieved

from DeviantArt.com

Takeaway messages

• To appear in Tomorrow’s meeting

ACMMM 2013 reading: Large-scale visual sentiment ontology and detectors using adjective noun pairs

Technology