ACMMM 2013 reading: Large-scale visual sentiment ontology and detectors using adjective noun pairs

Post on 15-Jan-2015

3,316 views 1 download

Tags:

description

Brief description of the paper "Large-scale visual sentiment ontology and detectors using adjective noun pairs" presented in ACM Multimedia 2013 as a full paper.

transcript

ACMMM2013 reading @ Kanto CV 2014.2.23

Akisato Kimura (@_akisato) NTT Communication Science Labs

Paper to read

Sentiment analysis of images

Basic strategy

• Adjective noun pairs (ANPs) – Adjectives play a significant role in conveying

sentiments, but visually inconsistent. – Combined phrases make the concepts more

detectable than single adj. & n. • cf. Recognition using visual phrases

[CVPR11]

Contributions

• Automatically construct a large-scale Visual Sentiment Ontology (VSO) with 3000 ANPs – With the help of psychological theories and web

mining techniques

• Propose SentiBank: a visual concept detector library to detect the presence of 1200 ANPs – Useful for sentiment analysis of visual contents as

attributes

Framework

1. Select 24 fundamental words representing emotion 2. Retrieve images with every of the words as a query 3. Tags associated with the images are extracted to

build ANPs ( = strong sentiment ADJs + all Ns) 4. Train ANP detectors and keep only detectors with

reasonable performance to form SentiBank

Framework

1. Select 24 fundamental words representing emotion 2. Retrieve images with every of the words as a query 3. Tags associated with the images are extracted to

build ANPs ( = strong sentiment ADJs + all Ns) 4. Train ANP detectors and keep only detectors with

reasonable performance to form SentiBank

24 basic words for emotions

• Founded on Plutchik’s Wheel on Emotions

http://en.wikipedia.org/wiki/Plutchik%27s_Wheel_of_Emotions#Plutchik.27s_wheel_of_emotions

1 2 3 4

1 2 3 4

1 2

3

4

1

2

3

4

1 2 3 4 1 2 3 4

1 2 3 4

1 2 3 4

1 2

3

4

1 2

3

4

24 basic words for emotions (cont.)

• 8 basic emotions x 3 degrees

1 2 3 4 1 2 3 4

1

2

3

4

1

2

3

4

Framework

1. Select 24 fundamental words representing emotion 2. Retrieve images with every of the words as a query 3. Tags associated with the images are extracted to

build ANPs ( = strong sentiment ADJs + all Ns) 4. Train ANP detectors and keep only detectors with

reasonable performance to form SentiBank

Sentiment word discovery

• Web mining strategy – Retrieve images & videos from Flickr & YouTube

with each of 24 basic words as a query – Extract their associated tags by Lookapp tool

[Borth+ ICMR11]

Sentiment word discovery (cont.)

• Exploits various NLP techniques & resources – Post-processings

• Remove stop words, perform stemming • Top 100 tags are selected for each emotion

– Sentiment value computation (-1 neg +1 pos) • SentiWordNet [Esuli+ 2006] SentiStrength [Thelwall+ 2010]

Framework

1. Select 24 fundamental words representing emotion 2. Retrieve images with every of the words as a query 3. Tags associated with the images are extracted to

build ANPs ( = strong sentiment ADJs + all Ns) 4. Train ANP detectors and keep only detectors with

reasonable performance to form SentiBank

ANP construction

• Take all the pairs of (ADJ, N)s into consideration – Remove named entities with meaning changed

(e.g. “hot” + “dog” generic named entity)

• Fuse sentiment values – Simple sum-up model : s(ANP) = s(ADJ) + s(N)

• If sgn(s(ADJ)) != sgn(s(N)), then s(ANP) = S(ADJ).

• Rank ANPs by their frequency – Remove all ANPs with no images – Resulting in 47K ANP candidates

ANP construction (cont.)

• Ontology sampling – Partition candidates into individual ADJ sets – Sample a subset from each ADJ set – Take ANPs with sufficient (>125) images

• Linking back to emotions – For each ANP, count images with 24 basic words & the ANP

in their meta, create a 24-dim histogram

How reliable ANP labels are?

• Web annotation may not be reliable – Using Flickr tags as pseudo ANP labels might incur

false positive

• Manual (=AMT) validation – Randomly sample images of 200 ANPs – Each image is validated by 3 Turkers, treated as

correct only if >= 2 Turkers agree – Results: 97% correct

Framework

1. Select 24 fundamental words representing emotion 2. Retrieve images with every of the words as a query 3. Tags associated with the images are extracted to

build ANPs ( = strong sentiment ADJs + all Ns) 4. Train ANP detectors and keep only detectors with

reasonable performance to form SentiBank

Training ANP detectors

• Various visual features – Color histogram (3 colors x 256 dim), GIST (512 dim),

LBP (53 dim), BoW with spatial pyramid and max pooling (1000 dim x 2 layers), attributes [Yu+ CVPR13] (2000 dim)

• Training a linear SVM for every ANP – Parameter tuning by cross validation (AP@20-based) – Measure performance by AP@20, AUC & F-score.

• Several feature fusions – Early fusion, late fusion, weighted early/late fusion

Detector performance

• Comparing visual features (left) – 1st: attributes, 2nd: BoWs

• Comparing feature fusions (right) – 1st: Weighted late fusion, but not dominant – Adopt early fusion for implementation simplicity

Examples

Detectability issues

• Select only ANPs with good detection accuracy – 1200 ANPs with AP@20>0 & F-score>0.6

• No correlation bwt detectability & occurrence – Difficulty in detecting ANPs depends on the

content diversity and the abstract level

Other issues

• Special visual features improve detectors – ObjectBank [Li+ NIPS2010], facial features, aesthetic

features [Bhattacharya+ ACMMM13] • Ontology structure

– Interactive process to combine 1200 ANPs into distinct groups 6 levels, 15 nodes at the top

• N: standard “is-a” relations • ADJ: exclusive (“sad” vs “happy”) & strength (“nice”, “great”,

“awesome”) – 41% nouns uncovered by ImageNet

• Related to abstract concepts (e.g. “violence”, “religion”)

Framework

1. Select 24 fundamental words representing emotion 2. Retrieve images with every of the words as a query 3. Tags associated with the images are extracted to

build ANPs ( = strong sentiment ADJs + all Ns) 4. Train ANP detectors and keep only detectors with

reasonable performance to form SentiBank

SentiBank applications

• Sentiment prediction in image tweets – Sentiment analysis rely on text-based tools – 140 characters (in ENG) are too short – Use SentiBank to complement and augment texts

• Emotion classification – Demonstrate the performance against an emotion

dataset of art photos [Machajdik+ ACMMM10]

Sentiment prediction in tweets

• Data collection – Gather tweets with images & popular hashtags

• #nuclearpower, #election, #championsleague, #cairo …

– AMT to obtain sentiment ground-truth • 3 Turkers for every tweets: almost agreed (below)

http://www.ee.columbia.edu/ln/dvmm/vso/download/twitter_dataset.html

Sentiment prediction in tweets (cont.)

• Visual-based classifier – Serve SentiBank as a mid-level representation

• Use ANP responses as an input feature • Employ a linear classifier for the final output

– Compare SentiBank with low-level features

Sentiment prediction in tweets (cont.)

• Text-based classifier – Naïve Bayes + SentiStrength

• Overall performance

Sentiment prediction in tweets (cont.)

• Detailed performance

Emotion classification

• Dataset – 807 art photos, 8 emotion categories retrieved

from DeviantArt.com

Takeaway messages

• To appear in Tomorrow’s meeting