Date post: | 05-Jun-2015 |
Category: |
Technology |
Upload: | arjen-de-vries |
View: | 576 times |
Download: | 2 times |
Context Adaptation
GOAL:
Present different photos to a sports journalist who queries for Beckham, than the glossy magazine editor issuing the same query
IPTC Categories
• ACE (arts, culture, entertainment)
• CLJ (crime, law & justice) • DIS (disasters & accidents) • EBF (economy, business &
finance) • EDU (education) • ENV (environment) • HTH (health) • HUM (human interest) • LAB (labour, work)
• LIF (lifestyle & leisure) • POL (politics) • REL (religion) • SCI (science & technology) • SOI (social issues) • SPO (sports) • WAR (unrest, conflicts,
war) • WEA (weather)
What Context?
• Collection context– One “main” IPTC category per image
• 96,351 out of 97,760 images in 100k Belga Collection• Note: noisy data, in spite of it being edited content!
E.g., we found lifestyle Beckham images annotated as SPO, and even typos in IPTC category assignment!
• User context– Classified 813 users into IPTC categories to
represent their main interest (based on Belga input about the user’s organizations)
Filter on IPTC?
//image[@IPTC eq SPO][about(.,Beckham)]
• Bad for recall:– Not all images have been assigned IPTC
categories
• Bad for precision:– Noisy assignment of IPTC categories to
images• At least 4 of the top 10 SPO Beckham results do
not show Beckham taking part in sporting activities
Retrieval Model
• Re-rank results based on cluster membershipλρd(q) + (1-λ) ∑c ∈ Clusters ρc(q) ρc(d)
– Modify scores based on document’s contextOren Kurland and Lillian Lee. ACM Transactions on Information Systems (TOIS), 27(3), 2009.
• Novelty in Vitalas:– Modify scores based on user’s context
• Cluster formation based on user clicks• Cluster selection based on user context
P(Q|D)P(Q|c) P(D|c)
Retrieval Model
• Cluster formation:– IPTC-image categories; forms disjoint clusters
– IPTC-user categories of users who clicked the image; gives overlapping clusters
• Cluster selection:– {d∈c}: cluster contains document
– {u∈c}: cluster/@category corresponds to user's interests
Results on Click Prediction
NDCG D image0.0
image0.1
image0.4
image0.7
user0.0
user 0.1
user 0.4
User0.7
ACEEBFEDUHTHHUMLABLIFPOLSOISPO
0.17240.55270.01450.13080.18490.13310.12450.07230.28800.1811
0.14230.47440.01630.13470.16120.15430.08880.05860.18060.1801
0.17410.54600.01450.13080.17980.13310.12340.07040.28830.1809
0.17210.54970.01450.13080.17720.13310.12330.07170.28800.1806
0.17210.55040.01450.13080.18490.13310.12320.07210.28800.1807
0.20700.48820.01650.63420.21090.21640.18940.10540.29640.2151
0.19780.55190.01670.37120.20430.23390.15550.09900.29700.2005
0.17670.55090.01550.19340.17760.18170.11210.09160.29680.1839
0.17470.55090.01460.14140.17600.13800.12530.07690.30080.1820
Related literature on evaluation methodology: Carterette and Jones, NIPS 2007, and, Carterette, Allan, and Sitaraman, SIGIR 2006.
No
Ada
ptat
ion
“Gre
ece”
SP
O A
dapt
atio
n“G
reec
e, c
olle
ctio
n-ba
sed
clus
ters
, λ=
0.1”
SP
O A
dapt
atio
n“G
reec
e, c
olle
ctio
n-ba
sed
clus
ters
, λ=
0.0”
SP
O A
dapt
atio
n“G
reec
e, u
ser-
base
d cl
uste
rs, λ
=0.
1”
SP
O A
dapt
atio
n“G
reec
e, u
ser-
base
d cl
uste
rs, λ
=0.
0”
SPO Observations
• Re-ranking pushes the sports-related images to the top– No more images about the fires
– When λ=0.0 the initial retrieval score is not taken into account (initial text ranking ignored)
• Minimal differences between collection-based and user-based cluster formation– Archivists consider as sports-related those
images that users with sports-related interests click on
PO
L A
dapt
atio
n“G
reec
e, c
olle
ctio
n-ba
sed
clus
ters
, λ=
0.1”
PO
L A
dapt
atio
n“G
reec
e, c
olle
ctio
n-ba
sed
clus
ters
, λ=
0.0”
PO
L A
dapt
atio
n“G
reec
e, u
ser-
base
d cl
uste
rs, λ
=0.
1”
PO
L A
dapt
atio
n“G
reec
e, u
ser-
base
d cl
uste
rs, λ
=0.
0”
POL Observations
• Re-ranking for a politics context shows a difference in interpretation between the archivist and the user group– Archivists focussed on the actual political
rallies etc.
– Users focussed on the forest fires
AC
E A
dapt
atio
n“G
reec
e, c
olle
ctio
n-ba
sed
clus
ters
, λ=
0.1”
AC
E A
dapt
atio
n“G
reec
e, c
olle
ctio
n-ba
sed
clus
ters
, λ=
0.0”
ACE Observations
• Re-ranking for arts, culture and entertainment requires λ=0.0, to ignore the initial ranking and let the right images shine
No
Ada
ptat
ion
“Bec
kham
”
SP
O A
dapt
atio
n“B
eckh
am,
colle
ctio
n-ba
sed
clus
ters
, λ=
0.1”
SP
O A
dapt
atio
n“B
eckh
am,
colle
ctio
n-ba
sed
clus
ters
, λ=
0.0”
HU
M A
dapt
atio
n“B
eckh
am,
colle
ctio
n-ba
sed
clus
ters
, λ=
0.1”
Conclusions this far
• Adaptation also retrieves images not assigned IPTC category, by considering clusters formed by the images clicked by users with the same interests
• Alternative cluster formation approaches can be investigated; e.g., using visual features
• Method easily adapted for personalised and/or collaborative search
Potential for Personalization
• Which queries have the potential to benefit by context adaptation (personalisation)?
• The ones for which different users click on different results– Can be studied looking at nDCG of one user
assuming another user’s clicks are idealJaime Teevan, Susan T. Dumais and Eric Horvitz. Potential for Personalization. ACM Transactions on Computer-Human Interaction (ToCHI) special issue on Data Mining for Understanding User Needs, 17(1), March 2010.
• Novel in Vitalas: compare IPTC-defined user groups (instead of individual users)
P4P in Belga 100K
P4P in Belga 100K
nDCG low: high potential
nDCG high: low potential
greece (0.3910)
Dean (0.8067)
King albert ii (0.7810)
No
Ada
ptat
ion
“Kin
g A
lber
t II
”
EB
F A
dapt
atio
n“K
ing
Alb
ert
II”
PO
L A
dapt
atio
n“K
ing
Alb
ert
II”
No
Ada
ptat
ion
“Dea
n”
AC
E A
dapt
atio
n“D
ean,
use
r-ba
sed
clus
ters
”
AC
E A
dapt
atio
n“D
ean,
col
lect
ion-
base
d cl
uste
rs”
Dean: Temporal Effect
• Log files: “Dean” = “Hurricane Dean”• Still, query is quite ambiguous:
– James Dean– Agyness Dean (a model)– a (university) dean– Dean Dealannoi– Howard Dean– Dean Martin
• Context adaptation for “Dean” requires archivist
Future Work
• Address various normalization issues– In context adaptation (due to NLLR
approximation)– In “potential for personalization”/adaptation
• Explore temporal dimension – Combinations of collection and user context?
• Explore cross-media cluster-based retrieval– Use visual features in cluster formation
See also
“CWI” Vitalas demonstrations:
http://www.ins.cwi.nl/projects/M4/vitalas/
Collection context instead of user context:
http://www.ins.cwi.nl/projects/M4/vitalas/context_adaptation.html
Detectors trained by query log
http://olympus.ee.auth.gr/diou/civr2009/