Toward Mixed-Initiative Clustering
Yifen HuangTom M. Mitchell
Carnegie Mellon University
Agents that Learn from Human TeachersMarch 23, 2009
Semi-supervised clustering:A user performs an oracle role.
Unsupervised clustering:A machine builds the model alone.
Semi-supervised clustering:A user performs an oracle role.
Unsupervised clustering:A machine builds the model alone.
Mixed-Initiative Clustering
Key Question
• How can autonomous clustering algorithms be extended to enable mixed-initiative clustering approaches involving an iterative sequence of computer-suggested and user-suggested revisions to converge to a useful hierarchical clustering?
– From autonomous clustering to mixed-initiative clustering
– From flat feedback to hierarchical feedback
Activity X contains this list of emails:## ## ## ## ##An email from Andrea
Thomaz belongs to your AAAI symposium activity.
Adam Cheyer is a key-person to your CALO activity.
Too lazy to comment.
Activity X contains this list of emails:## ## ## ## ##An email from Andrea
Thomaz belongs to your AAAI symposium activity.
What the hell is this?? DELETE!
This is correct.Adam Cheyer is a key-
person to your CALO activity.
Computer-to-user language: hypotheses
User-to-computer language: modified hypotheses
Model adaptation algorithm
Framework for Mixed-Initiative Clustering
User Interface
Communicative Languages inSemi-Supervised Clustering
Cluster Document
Communicative Languages inSemi-Supervised Clustering
Cluster DocumentConfirmRemove
Enriching Languages in Flat Clustering
Cluster Document
Word Person
ConfirmRemove
Enriching Languages in Flat Clustering
Cluster Document
Word Person
ConfirmRemove
ConfirmRemove
ConfirmRemove
Enriching Languages in Hierarchical Clustering
Cluster Document
Word Person
ClusterConfirmRemove
ConfirmRemove
ConfirmRemove
Cluster Document
Move
Word Person
Cluster
Move
MoveMergeAddSplit
ConfirmRemove
ConfirmRemove
ConfirmRemove
Enriching Languages in Hierarchical Clustering
Experiment Design
• Can mixed-initiative clustering help a user achieve the result faster?
• Can mixed-initiative clustering help a machine build a better model?
Dataset
• An email dataset of one of the authors– 623 emails– 6684 unique words and 135 individual people– Manually sorted into a hierarchy of 15 cluster nodes
including a root, 3 intermediate nodes and 11 leaf nodes
Feedback Sessions
• Five initial hierarchical clustering results
• Two feedback sessions on each result– Diligent session– Lazy session
Diligent User
Cluster Document
Move
Word Person
Cluster
Move
MoveMergeAddSplit
ConfirmRemove
ConfirmRemove
ConfirmRemove
Lazy User
Cluster Document
Move
Word Person
Cluster
Move
MoveMergeAddSplit
ConfirmRemove
ConfirmRemove
ConfirmRemove
Lazy User vs. Diligent User
Measurement• User feedback is equivalent to edge modification.• Edge Modification Ratio (EMR) equals the ratio of
edges needed to be modified in order to reach the reference hierarchy.
1
3 42 5
6 7 8
9
e1 e4
10 11 12 13 14 15
16 17 18 19 20 21
22 23 24
25 26 27 28 29e9
e3
e2
e6e5
e10e11
e12
e13e14
e7
e8
e15
e16
e17
e18
e19e20 e21
e22e23
e24e25e26
e27 e28
Considering hierarchical accuracy with user feedback
e3e8
e18
e21
e12
50.18
28EMR
Good Results (4/5)
Bad Result (1/5)
One More StepToward Mixed-Initiative
Clustering
Yifen HuangTom M. Mitchell
Carnegie Mellon University
Agents that Learn from Human TeachersMarch 23, 2009
Low-Latency Mixed-Initiative Clustering
Elaborated Framework for Mixed-Initiative Clustering
Future Work
• Feasibility study of the low-latency mixed-initiative interface