Dynamic information filtering
Patrick Baudisch
Xerox PARC
March 26, 2001
2
Contents
• Introduction
• Requirements and related work
• The TV Scout– …as a retrieval system– …and as a filtering system
• How it works– The QuerySet Architecture– Building QuerySet filtering systems– Manual profile editing
• Conclusions
3
• Introduction
• Requirements and related work
• The TV Scout– …as a retrieval system– …and as a filtering system
• How it works– The QuerySet Architecture– Building QuerySet filtering systems– Manual profile editing
• Conclusions
4
Motivation: Information overload
• Too many – research papers– books– movies– web pages– …– even TV programs!
• Goal: alleviate information overload
5
IF, IR, and dynamic filtering
• Analytic information seeking strategies– Retrieval (IR) changing interests, stable database– Filtering (IF) changing sources, stable interests
• Many application fit in– dictionaries => IR– music => IF
• Others fit into neither niche– High source and need change rate– Example stock market– [Oard 96]: “Grand challenge”
Filt
erin
g
RetrievalIn
form
atio
n so
urce
cha
nge
rate
Information need change rate
Dynamicinformation
filtering
6
Objective of dynamic filtering
• Adaptation speed is crucial– (user profile = interest) is crucial for filtering accuracy
– Interest changes: (profile interest) => filtering quality drops
– Adapt profile as fast as possible
• Subject of this thesis:Filtering architecture for maximum adaptation speed
7
• Introduction
• Requirements and related work
• The TV Scout– …as a retrieval system– …and as a filtering system
• How it works– The QuerySet Architecture– Building QuerySet filtering systems– Manual profile editing
• Conclusions
8
Requirements
• Requirement 1: Exhaustiveness (arbitrary interests)– (King and Sacramento), but not (King and Queen), INFOS [Mock 96]
• Requirement 2: Output style (single ranking preferred)– Boolean output, Info. Lens [Malone 87]; Categories, SIFT [Yan 95]
• Requirements 3-5: Adapt to interest changes
Permanent Temporary
Rapidly(caused by event)
Abrupt change[Marchionini 95, Lam 96, Frisse 89…]
Repetitive change[Allen 90, Loeb 92, Kay 95, …]Slowly
(caused by process)Gradual change
[Belkin 92, Baclace 91, Lang 95, ...]
9
userprofile
error
R3: Learning from relevance feedback
time[Jennings 91, p.207]
delayedprofile
error
interest
actualinterests
• Newt [Sheth and Maes 93]
• WebMate [Chen and K. Sycara]
• GroupLens [Konstan et al 97]
10
error
Rule-based systems• Information Lens [Malone et al 87]
• ISCREEN [Pollock 88]
• INFOSCOPE [Fischer 91]
delayedreaction
R4: Limitations of manual profile editing
Problems with gradual changes
userinterest
11
Resulting design guideline
• Build a filtering system that allows– learning from relevance feedback (for gradual changes)– users to edit their profiles directly (for abrupt changes)
• and– that uses a “meaningful” model for the user profiles,
so that users understand how to edit them
12
• Introduction
• Requirements and related work
• The TV Scout– …as a retrieval system– …and as a filtering system
• How it works– The QuerySet Architecture– Building QuerySet filtering systems– Manual profile editing
• Conclusions
13Query Frame Content frame
14
Best match
Q1. select a query
Exact match
15
programdescriptionlist
programdescription table
retentionmenus
Q2. read & retain program descriptions
…print them out, take them home
video labels
laundry list
16
Q3. suggestions
suggest queries
17
• Introduction
• Requirements and related work
• The TV Scout– …as a retrieval system
– …and as a filtering system
• How it works– The QuerySet Architecture– Building QuerySet filtering systems– Manual profile editing
• Conclusions
19
Best match profile (QuerySet profile)
QuerySet profile editor
(Expert mode)
QuerySet Profile:Personal programper singlemouse click
QuerySet Profile:Personal programper singlemouse click
20
Summary
TV Scout interface with starting page
viewing timeprofile editor
channelprofileeditor
querymenus
QSAmenu
textsearch
programdescriptionlist
programdescription table
suggest queries
QSAprofileeditor
QSA profileeditor (experts)
retentionmenus
video labels
laundry list
21
Incremental usage
queries(one shot state)
S1
U1
T1
bookmarks(reuse state)
user defines
system suggests
S2
U2
T2 system compiles
QSA profile(filtering state)
S3
T
user updates
system learnsT3
U3
start
system provides
user writes
22
Studies done on the TV Scout so far
• Comparison of individual query classes– > 13,000 registered users– Predefined queries (genres) covered most interests– Text search for what genres do not cover
• Search for actors, series, topics
– “Opinion leader” recommendation was 5th most popular query
• Long term study still outstanding
23
• Introduction
• Requirements and related work
• The TV Scout– The TV Scout as a retrieval system…– …and as a filtering system
• How it works– The QuerySet Architecture– Building QuerySet filtering systems– Manual profile editing
• Conclusions
24
QuerySet profile vs. other user profiles
• Queries in QSA profile intended to represent different interests– != query representation nodes
– != concepts (or facets) that are part of a query/interest.
– != IR query that represents a single interest only
r1 rm
d2
r3r2
userprofile
djd1 dj-1
…
…
QSAprofile
q1
A
qn…
e.g. news,sports,Comedyshows
e.g. news,sports,Comedyshows
How doesuser like newscompared tosports…?
How doesuser like newscompared tosports…?
This is not (necessarily)an inference network
25
Objective of that decomposition
• Several interests changes can be handled with minor profile changes
– “I am not in the mood for action movies today”
– “My taste in action movies has changed”
=> Update only query weight in aggregation functionBenefit: all queries remain unaffected
Edit only action movies queryBenefit: all other queries remain unaffected
26
Make queries correspond to interests
• Selection principle– Make a query what will change as a whole– It is interests that change– => Use queries corresponding to interests
• Negative examples– Data fusion (e.g. [Fox 94, Lee 97]) => redundancy– Automated collaborative filtering => overlap
• Positive example:– The Incremental usage supported by QSA systems:
Use as query, then bookmark, then use as profile
queries(one shot state)
S1
U1
T1
bookmarks(reuse state)
user defines
system suggests
S2
U2
T2 system compiles
QSA profile(filtering state)
S3
T
user updates
system learnsT3
U3
start
system provides
user writes
27
• Introduction
• Requirements and related work
• The TV Scout– The TV Scout as a retrieval system…– …and as a filtering system
• How it works– The QuerySet Architecture
– Building QuerySet filtering systems– Manual profile editing
• Conclusions
28
How to build QSA systems? Reuse!
IR/IF subsystemrunning the
aggregation function
IR/IF subsystemrunning the
aggregation function
Query-executingIR/IF subsystem
Query-executingIR/IF subsystem
Post-conversionPost-conversionPost-conversion
relevance ratings
IR/IF subsystemrunning the
aggregation function
Re-post-conversionRe-post-conversion
Re-pre-conversionRe-pre-conversion
relevance feedback
Re-post-conversion
query feedback
aggregation feedback
Re-pre-conversionPre-conversionPre-conversion
query ratings
output rating
Pre-conversion
Query-executingIR/IF subsystem
Sybase,FreeWAIS,Print import,<more>
Sybase,FreeWAIS,Print import,<more>
29
Aggregation subsystem
• Example– User profile = {action movies, comedies, Tips by Lars}– Aggregation: turn these three rankings into a single ranking– Is a programs {0.4 action movie, 0.3 comedy, “excellent” by Lars}
better than {0 action movie, 0.8 comedy, “ok” by Lars}?
• Notion of tradeoffs similar to IR/IF systems on term frequencies– Query = {“information”, “retrieval”}– Is a web page {0.4 information, 0.3 retrieval}
better than another web page {0 information, 0.8 retrieval}?
• => Reuse IR/IF systems
• Weighted request and indexing retrieval model– Output rating(object) = Sum of query ratings– TV Scout: Overlap between queries was small enough
=> This model is sufficient
30
• Introduction
• Requirements and related work
• The TV Scout– The TV Scout as a retrieval system…– …and as a filtering system
• How it works– The QuerySet Architecture– Building QuerySet filtering systems
– Manual profile editing
• Conclusions
31
Simple case: “Rate a query”
• What is the general concept behind profile editors?
• Rate a query as a whole“How do you like science fiction movies”?
• => This is fast, because users can take experience with and expectations about query into account
• But what if the user lovesnews programs, but wantsonly a few top-ranked ones?(redundancy between news)
32
General case: “Rate a set”
• Generalization– Ask user to rate arbitrary set of objects– Example “How do you like:
{Back to the future, Brazil, Blade runner, 1984…Metropolis}?
• User-aggregated relevance feedback– The user mentally assigns a rating to each object– The user aggregates these and tells the system the result– This save effort for communicating individual ratings
• Benefit– “Rate a query” is a special case of “Rate a set”– This makes both compatible with relevance feedback
33
Combine both
• Goal: find a way– as simple and fast as “rate a query”– as flexible as “rate a set”
• Solution– Use top and bottom ranks of queries (and others)
– Extensible to arbitrary ranks -> Histogram-based interfaces
“How muchdo you liketop-rankednews programs?”
“How muchdo you likebottom-rankednews programs?”
34
paintableinterfaces
Profile editor framework
Query-wise preferable if few queries (e.g. query inserted)
Property-wise preferable if many queries (e.g. mood change)
few URF samples (simplicity): form-based
interfaces
many URF samples (accuracy):
histogram-based interfaces
Skip all
2. Dead Poets Society1. Bayern-Manchester2. Amazons on Mars-------------------------------2. Le Grand Bleu1. Sat1 ran
his
tory
B. Hills
Soap
ComedyM. ArtsAction
movies
Information
Schwarz..SimpsonsM.A.S.H.
Sports
Basketball
C. music Theater Golf
SeriesSeries
undo
save
execute
Actionmovies
Information Sports
B. Hills
Soap
M.A.S.H.
C. music Theater
M. Arts
Schwarz..
Basketball
Golf
SeriesSeries
Comedy
Simpsons
Sitcom
Skip
35
Paintable interfaces
36
Example for multiple select
37
Multiple select applied to interest
Information Sports
BeverlyHills 90210
Endorsedby Paul
Comedy“Action AND
Comedy”
Actionmovies
Schwarzenegger
Endorsedby Lars
M.A.S.H.
Basketball
Classicmusic
Theater Golf
Series
Information
BeverlyHills 90210
Endorsedby Paul
Comedy“Action AND
Comedy”
Actionmovies
Schwarzenegger
Endorsedby Lars
M.A.S.H.
Classicmusic
Theater
Series
38
Multiple select versus painting
PaintingFunction (tool) selection first,then pixel selection (painting)
Multiple selectPixel selection first,then function selection
Immediate visual feedbackallows differentiated input
39
DanishDanish
MilkMilk
Pan-cakes
Pan-cakes
OrangeJuice
OrangeJuice
BaconBacon
TOTAL
TOTAL
FrenchToast
FrenchToast
Englishmuffin
Englishmuffin
HashBrowns
HashBrowns HamHam
EggsEggs
RootBeer
RootBeer
MilkShake
MilkShake
CookieCookie
ChickSand
ChickSand
IcedTea
IcedTea
Fishsand
Fishsand
FruitPie
FruitPie
SundaeSundae
CheeseBurger
CheeseBurger
HamBurger
HamBurger
FrenchFries
FrenchFriesColaCola
OnionRings
OnionRings
CoffeeCoffee
Layout by co-occurrence
TOTAL
TOTAL
40
A paintable profile editor
his
tory
B. Hills
Soap
ComedyM. ArtsAction
movies
Information
Schwarz..SimpsonsM.A.S.H.
Sports
Basketball
C. music Theater Golf
SeriesSeries
undo
save
execute
Actionmovies
Information Sports
B. Hills
Soap
M.A.S.H.
C. music Theater
M. Arts
Schwarz..
Basketball
Golf
SeriesSeries
Comedy
Simpsons
Sitcom
Insertion of “sitcom”
41
Paintable time and channel editors
• Interval sliders are split into segments• no handles, just paint the addition• Intervals labeled as entities to reduce cluttering
42
• Introduction
• Requirements and related work
• The TV Scout– The TV Scout as a retrieval system…– …and as a filtering system
• How it works– The QuerySet Architecture– Building QuerySet filtering systems– Manual profile editing
• Conclusions
43
QSA vs. requirements
• Requirement 1: Exhaustiveness
• Requirement 2: Output style
• Requirements 3-5: Adapt to interest changes
Permanent Temporary
Rapidly(caused by event)
Abrupt change
Repetitive change
Slowly(caused by process)
Gradual change
arbitrary interests
single ranking
User-aggregatedrelevance feedback
Relevance feedback
Reuse of old queries(weight set to zero)
44
Achievements of the dissertation
• (1) a new generic IF system architecture designed for the efficient handling of highly dynamic interests(the QuerySet Architecture)
• (2) a new paradigm of high-level access to user profiles (user-aggregated relevance feedback)
• (3) a framework of new user interface interaction styles providing users with this high-level access
• (4) a proof of concept implementation (TV Scout)
45
Future work
• (1) new application areas
• (2) new query classes • (3) improved aggregation functions• (4) new profile editor user interfaces
• (5) empirical work.
46
END
47
Image processing
Luminance
Num
ber
of p
ixel
s there areno blackpixels
there areno white
pixels
only rather dark pixels
white handleassigns 100%luminance
black handleassigns 0%luminance
current stateof the image
desired stateof the imagegray handle
assigns 50%luminance
48
Slide rule (Rechenschieber)
11½0
1½00
action movies
comedies
||
||
merge histograms“zipper style”
c o m e d i e s
a c t i o n m o v i e s
¾¼
¾¼
49
Histogram-based interfaces
hot!selectedrejected
Martial arts
Legend
Comedyshows
Entertain-ment
Sports
32 out of 333 sports programs per week selected
512 out of 914 movies per week selected
Terminator 2Dead Poets SocietyAmazons on Mars--------------------------Le Grand BleuBack to the Future
hot!selectedrejected
Martial arts
Legend
14 out of 14 martial arts programs per week selected
Overall: 1094 out of 1797 programs per week selected
Save Undo
Comedyshows
Entertain-ment
536 out of 536 comedy shows per week selected
Sports
50
The jelly interface
SelectedSelectedfor outputfor output
Save UndoAuto
Overall: 32 out of 59 programs per week selected
News Comedy Action
51
STUFF
QSA vs. related work
53
QSA can emulate some of them
• SDI systems (Selected Dissemination of Information
• Rule-based systems
• Stereotype-based systems
• Automated collaborative filtering systems
Short break?
Chapter 4: User interfacesNormalization and interest intensity editors
1. Form-based
2. Histogram-based
3. and Paintable Interfaces
56
Parameters users know
• Interest intensities“How important is that query to you”
• Amounts of objects“How many objects do you want from that query”
57
Relating histograms to each other
Movingarrows
Movinghistograms
58
What is in and what is not?
2. Dead Poets Society1. Bayern-Manchester2. Amazons on Mars-------------------------------2. Le Grand Bleu1. Sat1 ran2. Back to the Future
59
Comparison
2F1F
0F2H 1H
60
Results
• 2D preferred over 1D• Computer experts preferred
the more powerful histogram-based editors • Computer novices prefer form-based
987654321
5
4
3
2
1
0
wonderfulhorrible horrible wonderful987654321
5
4
3
2
1
0
2F 2H
Nu
mb
er o
f su
bje
cts
Nu
mb
er o
f su
bje
cts
Computer novice
Computer expert
Chapter 5: TV Scout
•TV compared to other application areas•TV Scout user interface overview•Gathering implicit feedback•The TV Scout query classes
TV Scout User interface
73
Query classes: applicability
Genres Opinion leadersTextsearch Popu-
larityACF User
tipsEditortips
Find known program, e.g. The Matrix ++ o -- -- -- -1
Find information on topic, e.g. Clinton ++ + -- -- -- --
Find specific entertainment, e.g. Action o2 ++ -- -- o3 -
Find any program user will like - o o5 ++7 +4 o5
Tas
k
Find any liked program broadcast now,for pastime (provides high coverage)
--6 -6 o5 ++7 --6 --6
Appropriate for inexperienced users(Ease of manually finding right query)
+8 ++ ( )9 ( )9 o10 ++
Works when retention tool is empty(Prediction quality during cold-start)
++ ++ ++ --11 ++ ++
Works when system has few users(Prediction q. while not critical mass)
++ ++ - -- o12 ++
Works for early raters(long-time planners/opinion leaders)
++ ++ - -- +/-13 +13
Situ
atio
n
Efficiently identifiable as outdated afterinterest change (specificity & naming)
++ ++ o14 -- o15 ++
Chapter 6: Conclusions
75
Thanks
• Dieter Boecker
• Uli Thiel
• Matthias Hemmje
END
NOT INCLUDED
78
Classification of IF systems
Objects
User
Ratedobjects
featureextraction
matching
Profile(= objects)
Ratedstereotypes
stereotypeexpansion
Profile(= stereotypes)
Ratedattributes
Profile(= attributes)
feedback
attributes
79
Bar chart histogram
Invertf(x)=b-1(x)
rating
coun
t
rank
ratin
g
Invertb(x)=f-1(x)
rating
rank
Integratef(x)=- h(x)dx
Differentiateh(x)= - df/dx
bar chart
histogram
80
Email Profile Editor
81
Channel interface toggle look
82
Banner advertising dialog
Daily life
Shopping
Apparel
Food
Cosmetics
Multimedia
Music
Games
Movies
Concerts
Books
Computer
Hardware
Software
Internet
Services
Electronics
Telecomm.
TV
Video
Hi-fi
Mobility
Cars
Flights
Trains
Last minute
Hotels
Money
Insurance
Stocks
Services
Contact
Jobs
Friends
Dating
Classifieds
Sports&Fun
Sports
Clubs
Traveling
Infotainment
News
Magazines
Media
CompetitionFree stuff
Banking
Daily life
Shopping
Apparel
Food
Cosmetics
Multimedia
Music
Games
Movies
Concerts
Books
Computer
Hardware
Software
Internet
Services
Electronics
Telecomm.
TV
Video
Hi-fi
Mobility
Cars
Flights
Trains
Last minute
Hotels
Money
Insurance
Stocks
Services
Contact
Jobs
Friends
Dating
Classifieds
Sports&Fun
Sports
Clubs
Traveling
Infotainment
News
Magazines
Media
CompetitionFree stuff
Banking
doneundodoneundo
83
Toggle tree maps
94
Interest changes in literature
• Gradual changes [Belkin 92, Baclace 91, Lang 95, ...] Consequence of processes, e.g. as people age– Example: Favorite TV series
• Abrupt changes [Marchionini 95, Lam 96, Frisse 89...]– Consequence of events– Example: Actor quits series
• Temporary variations [Allen 90, Loeb 92, Kay 95, …]– Mood changes– Example: In the mood for an action movie
How to tackle the problem?
Learn frominteractive computer graphics
96
Computer graphics vs. Info filtering
Computer graphics Information filtering
What is in user’s head…
image = assigns color value to image coordinates.
relevance function = assigns relevance value to objects.
…is modeled as
digital image(by Graphics programs)
user profile(by IF systems)
Sampling-oriented
Image = Bitmap images(”Painting”)
Profile = set ofrelevance feedback…(ACF, feature extract.)
Object-oriented
Image = Set of graphical primitives (”Drawing”)
Profile = Set of rules etc. (Rule-based systems)
97
Interactivity in computer graphics
• IF: Interest changes are not known in advance• => CG: interactive animation, e.g. video games
98
Interactivity in computer graphics
99
Interactivity in computer graphics
100
Requirement: Detail and interactivity
• Requirements– high interactivity (rapid reaction to input)– graphical quality
• Video games: scene graph and bitmaps– Bitmaps for the details– Scene graph for the modifiability
• Application programs: Assimilate characteristics of other approach– Drawing programs => texture maps [Foley 90].– Painting programs => layers [Adobe].
101
Benefit from using layers in CG
• Creating all layers >= painting a single frame.
• … but, pays off when the scene is animated– Represent change in scene graph
(translate, fade in or out, or taint a layer, …)– Update only selected layers
• Group into one layer what will change as a whole
102
Transfer the idea
• Transfer the idea 2D animation to information filtering– n layers => n queries
(Query = “a function that assigns ratings to objects”)– Scene graph => “Aggregation function”
103
What if overlap is substantial?
• The WRIR model assumes mutual independence
• This is not always justified– Two queries are used in a data fusion way (=> redundancy)– Action, comedy, but user dislikes action comedies (=>
implicit interest)
• => Use model that can learn relation between queries
106
Model 2: Implementation as inference network
r1 rm
d2
r3r2
q1
dj
I
d1
qn
dj-1
…
…
…
QSAprofile
r1 rm
d2
r3r2
q11
dj
I1
d1
q21
dj-1
…
…
…
Ik
q2n
AQSA
profile
107
Learning inference network
• [Baclace 91]
• Simple “agents” represent each query
• Complex “agents” represent conjunctions of queries
• Agents learn from relevance feedback what this query match or combination is worth
109
Normalization in image processing
110
Demo levels dialog
d
c
f
e
b
a
111
Results of user study
• What confuses users is the surface property“What does the height of these boxes mean?”
• They had recognized bar charts, not histograms
• => Better give up bar chart look
• Which real-world object has the right properties– Deformable…– …but not compressible (constant volume)– Preserves its shape when deformed
112
Histograms help combining knowledge
Outputranks
Queryranks
system needs to compute aggregation function
Output ratings
histogram setindividual histograms
Query ratings
Objects (if displayed)
113
Inserting queries in QSA profile
114
TV Scout
TV Scout interface with starting page
viewing timeprofile editor
channelprofileeditor
querymenus
QSAmenu
textsearch
programdescriptionlist
programdescription table
suggest queries
QSAprofileeditor
QSA profileeditor (experts)
retentionmenus
video labels
laundry list
115
Some design possibilities
B. Hills
Soap
ComedyM. ArtsAction
movies
Information
Schwarz..SimpsonsM.A.S.H.
Sports
Basketball
C. music Theater Golf
SeriesSeries
Information Sports
BeverlyHills 90210
Soap Comedy Martial arts
Actionmovies
SchwarzeneggerSimpsonsM.A.S.H.
Basketball
Classicmusic
Theater Golf
TalkSeries
Information Sports
BeverlyHills 90210
Soap Comedy Martial arts
Actionmovies
SchwarzeneggerSimpsonsM.A.S.H.
Basketball
Classicmusic
Theater Golf
TalkSeries
Information
Sports
Beverly Hills 90210
Soap Comedy
Martial arts
Action movies
Schwarzenegger
Simpsons
M.A.S.H.
Basketball
Classic music Theater Golf
Series
116
Information Sports
Comedy“Action AND
Comedy”
Actionmovies
Endorsedby Lars
M.A.S.H.
Series
Basketball
Schwarzenegger
BeverlyHills 90210
Endorsedby Paul
Classicmusic
Theater Golf
Painting (instead of multiple select)
• Use different colors to express different degrees of like or dislike
Information Sports
Comedy“Action AND
Comedy”
Actionmovies
Endorsedby Lars
M.A.S.H.
Series
Basketball
Schwarzenegger
“Action ANDComedy”
Actionmovies
Endorsedby Lars
M.A.S.H.
Basketball
Schwarzenegger
Basketball
Schwarzenegger
117
Semantic space layout
• Layout according to geographic location ofTV stations
118
3D and 4D paintable interfaces
• Domains with naturaln-dimensional structure
• Display in n-d• Explosion displays keep
2-d painting applicable
119
program descriptions
Content providerContent provider
Movie databaseProgram descriptiondatabase
Query subsystemsQuery subsystems
Exact match filteringExact match filtering
Date
Time Profile
ChannelProfile
feedback
QSA filtering
QSA profile
Retention toolsRetention tools
Vid
eo
labe
ls
Lau
ndry
list
Time Dialog
ChannelDialog
Edi
tors
’tip
s
Use
rtip
s
Tex
tse
arch Gen
res
Est
im.
Pop
.
AC
F
ad h
oc q
uery
120
Query-executing subsystems
• Use everything that returns (object, rating) pairs• Can use retrieval systems, but also others
• TV Scout– Genres, hand-made function in Sybase database– Text searches run in FreeWAIS– Editor’s recommendations imported from print magazine– User tips done by users
– Plug in more query-executing subsystems at any time
121
b
rating oftop-ranked
object
cut-off
ou
tpu
t ra
ting
query rating
rating of top-ranked object
ou
tpu
t ra
ting
query rating
ratingof bottom-
ranked object
a c
amount-defined
ratingdefinedo
utp
ut
ratin
g
query rating