Iui2015: Personalized Search: Reconsidering the Value of Open User Models

?"Conventional UMs:Black Box

Open UM

Weber, G. and Brusilovsky, P. (2001) ELM-ART: An adaptive versatile system for Web-based instruction. International Journal of Artificial Intelligence in Education 12 (4), 351-384.

Ahn, J.-w., Brusilovsky, P., Grady, J., He, D., and Syn, S. Y. (2007) Open user profiles for adaptive news systems: help or harm? In: 16th international conference on World Wide Web, WWW '07, Banff, Canada, May 8-12, 2007, ACM, pp. 11-20

https://www.youtube.com/watch?v=Yt1fMEFlLVA&index=2&list=PLyCV9FE42dl7JG_i7m_kvwuYRpfwwJ4iY

https://www.youtube.com/watch?v=Yt1fMEFlLVA&index=2&list=PLyCV9FE42dl7JG_i7m_kvwuYRpfwwJ4iY

STUDY DESIGN

A user study was conducted to test the advantages of Adap-tive VIBE+NE’s concept-based visual open user modeling.The study was designed to simulate the work situation ofan information analyst engaged in a sufficiently complex ex-ploratory search with information foraging and sense-makingstages [21]. The tasks and the documents were provided byan expanded TDT4 (Topic Detection and Tracking) documentcollection that contains 28,390 English documents publishedfrom October 2000 to January 2001. The original TDT4 top-ics were enriched to resemble the tasks performed by intelli-gence analysts [16] with 18 topics and ground-truth informa-tion. In order to select as equally difficult topics as possibleand to make them comparable with each other, we deviseda measure considering the ground-truth information distribu-tion in the corpus. In some topics, answers are concentratedin a small number of documents (easier because it reducesuser efforts to explore more documents) whereas other top-icsdisperse answers across many documents (moredifficult).Therefore, wedefined the standard deviation of relevant pas-sage count per document as a pseudo topic complexity mea-sure (Equation 1). Three TDT4 topics with equivalent topiccomplexity were selected as thestudy topics (Table 1).

Complexi tyt opi c =

vuut 1

|Docr el |

|D ocr el |X

i = 1

(|Passager el | − µ))2

(1)

The participants were recruited from the University of Pitts-burgh and Carnegie Mellon University. They were expectedto play theroleof information analystswho couldproficientlyoperate the search systems for the task completion and wererequired to meet the following criteria: (1) native Englishspeakers or equivalent language abilities; (2) sufficient infor-mation retrieval skills; (3) sufficient intellectual ability as an-alysts from their educational backgrounds [4]. Thirty threeparticipants (23 males and 10 females) were recruited andwereasked to complete search tasksacross two sessions. Theaverage age of the participantswas 26.1 with standard devia-tion = 6.5.

They wereasked to solvetwo tasksunder two conditions: (1)Adaptive VIBE and (2) Adaptive VIBE+NE.1 The order ofthe systems was randomized by Latin square design to avoidlearning or fatigue effects. They read an one-page introduc-tory statement and completed a demographic questionnairefocusing on their search experienceandconsumption of news.50 minute training sessions weregiven to them where theex-periment coordinator explained the system features and al-lowed them to solve a realistic training task. The trainingsession is important because the Adaptive VIBE visualiza-tion systems were new to the participants and they neededpractice. Themain search sessions were 20 minutes each (40minutes in total) and noattempt tocomplete thetasksafter thegiven time was allowed. After the sessions they were asked

1For simplicity, they are referred to as “VIBE” and “VIBE+NE”henceforth.

Introduction Statement

Entry Questionnaire

Training

Search Session 1System: VIBE or VIBE+NE

Post-questionnaire


Post-questionnaire

Exit Interview

Figure 3: Study procedure

to fill out post-task questionnaires and took 10-minute exitinterviews.

Table 1: Topic difficulty: distribution of relevant information

Topic ID 40009 40021 40048Complexi tyt opi c 71.46 73.98 64.12

(most difficult) (least difficult)

RESULT ANALYSIS

System Performance Analysis

Traditional information retrieval systems presenting searchresults with one-dimensional ranked lists try to assure thatrelevant documents are placed higher in the lists, whereusers have more chances to access them. However, AdaptiveVIBE attempts to visualize the retrieved documents on a 2Dplane and to spatially separate relevant documents from non-relevant ones to eventually draw user attention to the relevantdocument group. In this section weexplore two parameters –clustering and positioning – that measure how successfullythe system visualizes relevant documents. It assesses twoquestions: (1) to what extent the performance improvementcan beattributed to theuser model, i.e., if theperformance in-creases with the gradual improvement of theuser model overtime and (2) whether the NE POI can lead to better systemperformance.

Visual separation of relevant documents. An important at-tribute of Adaptive VIBE is the ability to spatially separate

STUDY DESIGN

A user study was conducted to test the advantages of Adap-tive VIBE+NE’s concept-based visual open user modeling.The study was designed to simulate the work situation ofan information analyst engaged in a sufficiently complex ex-ploratory search with information foraging and sense-makingstages [21]. The tasks and the documents were provided byan expanded TDT4 (Topic Detection and Tracking) documentcollection that contains 28,390 English documents publishedfrom October 2000 to January 2001. The original TDT4 top-ics were enriched to resemble the tasks performed by intelli-gence analysts [16] with 18 topics and ground-truth informa-tion. In order to select as equally difficult topics as possibleand to make them comparable with each other, we deviseda measure considering the ground-truth information distribu-tion in the corpus. In some topics, answers are concentratedin a small number of documents (easier because it reducesuser efforts to explore more documents) whereas other top-icsdisperse answers across many documents (moredifficult).Therefore, wedefined the standard deviation of relevant pas-sage count per document as a pseudo topic complexity mea-sure (Equation 1). Three TDT4 topics with equivalent topiccomplexity were selected as the study topics (Table 1).

Complexi tyt opi c =

vuut 1

|Docr el |

|D ocr el |X

i = 1

(|Passager el | − µ))2

(1)

The participants were recruited from the University of Pitts-burgh and Carnegie Mellon University. They were expectedtoplay theroleof information analystswho couldproficientlyoperate the search systems for the task completion and wererequired to meet the following criteria: (1) native Englishspeakers or equivalent language abilities; (2) sufficient infor-mation retrieval skills; (3) sufficient intellectual ability as an-alysts from their educational backgrounds [4]. Thirty threeparticipants (23 males and 10 females) were recruited andwereasked to complete search tasksacross two sessions. Theaverage age of the participantswas 26.1 with standard devia-tion = 6.5.

They wereasked to solvetwo tasks under two conditions: (1)Adaptive VIBE and (2) Adaptive VIBE+NE.1 The order ofthe systems was randomized by Latin square design to avoidlearning or fatigue effects. They read an one-page introduc-tory statement and completed a demographic questionnairefocusing on their search experienceandconsumption of news.50 minute training sessions were given to them where theex-periment coordinator explained the system features and al-lowed them to solve a realistic training task. The trainingsession is important because the Adaptive VIBE visualiza-tion systems were new to the participants and they neededpractice. Themain search sessions were 20 minutes each (40minutes in total) and noattempt to complete thetasksafter thegiven time was allowed. After the sessions they were asked

1For simplicity, they are referred to as “VIBE” and “VIBE+NE”henceforth.

Introduction Statement

Entry Questionnaire

Training

Search Session 1

System: VIBE or VIBE+NE

Post-questionnaire


Post-

questionnaire

Exit Interview

Figure 3: Study procedure

to fill out post-task questionnaires and took 10-minute exitinterviews.

Table 1: Topic difficulty: distribution of relevant information

Topic ID 40009 40021 40048Complexi tyt opi c 71.46 73.98 64.12

(most difficult) (least difficult)

RESULT ANALYSIS

System Performance Analysis

Traditional information retrieval systems presenting searchresults with one-dimensional ranked lists try to assure thatrelevant documents are placed higher in the lists, whereusers have more chances to access them. However, AdaptiveVIBE attempts to visualize the retrieved documents on a 2Dplane and to spatially separate relevant documents from non-relevant ones to eventually draw user attention to the relevantdocument group. In this section weexplore two parameters–clustering and positioning – that measure how successfullythe system visualizes relevant documents. It assesses twoquestions: (1) to what extent the performance improvementcan beattributed to theuser model, i.e., if theperformance in-creases with thegradual improvement of the user model overtime and (2) whether the NE POI can lead to better systemperformance.

Visual separation of relevant documents. An important at-tribute of Adaptive VIBE is the ability to spatially separate

Table 2: Comparison of x-coordinates of relevant and non-relevant document cluster centroids

Mean x-coord Rel Non-rel diffOverall 458.03 379.55 78.48VIBE 414.43 372.36 42.07

VIBE+NE 492.58 397.66 94.92

relevant and non-relevant documents and place relevant doc-ument clusters closer to the user model (to the right in Fig-ure 1). It is similar to the behavior of search systems thatpromote the relevant documents to the top of the lists. How-ever, Adaptive VIBE displays the documents in a 2D spaceso the POIs play the equivalent role with the top position ofthe ranked lists. This feature is achieved by the layouts thatspatially separate thequery and theuser model POIs in thevi-sualized document space. Previousstudies [2, 4] havealreadyobserved this property for keyword-only VIBE user models.In particular the separation was amplified by adding NEs tothe VIBE user models in [3]. Yet the result was observedin a simulation using log data from a text-based personalizedsearch study [7]. Therefore, we examined if the simulationresults could be replicated in a user study.

For VIBE and VIBE+NE, werecorded all visualization snap-shots shown to the participants during the user study and ex-amined if the same results were observed in actual searchtasks. Table 2 compares the centroid locations of relevantand non-relevant document clusters. Of course, the relevanceof the documents was invisible to the user study participantsand was marked for the post-study analysis. The locationsrepresent the x-coordinates of the documents. Because theuser model POIs are located to the right of the visualizationand the coordinate follows the usual Java 2D coordinate sys-tem where a top-leftmost point has (x = 0, y = 0), big-ger values in this table mean the centroids are located closerto the user models. It is clearly observed that the relevantdocument cluster centroids were formed closer to the usermodels (458.03 versus 379.55) than the non-relevant docu-ment clusters (Wilcoxon rank sum test, p < 0.001). More-over, NE-based user models (VIBE+NE) show bigger sepa-ration (around 100 pixels) than keyword-based user models(VIBE, around 69 pixels) and the relevant document clusterin VIBE+NE is even closer to the user models than VIBE(492.58 versus 414.43). The difference is statistically signif-icant (Kruskal-Wallis rank sum test, p < 0.001). This resultconfirms the aforementioned simulation result [3] and sug-gests that VIBE+NE hasstronger relevant document discrim-ination power than baseline VIBE. Table 3 provides anothermetric regarding the document cluster quality. The Davies-Bouldin Validity Index (DB-index) [13] measures thecluster-ing quality by comparing the within document-centroid dis-tances versus between-cluster centroid distances. A smaller

Table 3: Comparison of DB-index between systems

System VIBE VIBE+NE pOverall 1.70 1.69 < 0.001

Figure 4: Movement of top-10 documents through searchprogress: compared by system (VIBE vs. VIBE+NE)

Figure 5: Movement of top-10 documents through searchprogress: compared by topic

DB-index value means a better clustering result. Overall,VIBE+NE shows significantly better DB-index even thoughthe difference is small (Kruskal-Wallis rank sum test, p <0.001).

Position of system recommended documents. Inaddition tothe ability that can guide users to relevant documents closerto visual user models, Adaptive VIBE estimates documentrelevance by combining query-document and user model-document similarity scores. The top-most highly scored doc-uments are painted in blue (see Figure 1), so that users caneasily identify system-recommended documents. It is im-

visual separation of relevant docs

Table 2: Comparison of x-coordinates of relevant and non-relevant document cluster centroids

Mean x-coord Rel Non-rel diffOverall 458.03 379.55 78.48VIBE 414.43 372.36 42.07

VIBE+NE 492.58 397.66 94.92

relevant and non-relevant documents and place relevant doc-ument clusters closer to the user model (to the right in Fig-ure 1). It is similar to the behavior of search systems thatpromote the relevant documents to the top of the lists. How-ever, Adaptive VIBE displays the documents in a 2D spaceso the POIs play the equivalent role with the top position ofthe ranked lists. This feature is achieved by the layouts thatspatially separate thequery and theuser model POIsin thevi-sualized document space. Previousstudies [2, 4] havealreadyobserved this property for keyword-only VIBE user models.In particular the separation was amplified by adding NEs tothe VIBE user models in [3]. Yet the result was observedin asimulation using log data from a text-based personalizedsearch study [7]. Therefore, we examined if the simulationresults could be replicated in auser study.

For VIBE and VIBE+NE, werecorded all visualization snap-shots shown to the participants during the user study and ex-amined if the same results were observed in actual searchtasks. Table 2 compares the centroid locations of relevantand non-relevant document clusters. Of course, the relevanceof the documents was invisible to the user study participantsand was marked for the post-study analysis. The locationsrepresent the x-coordinates of the documents. Because theuser model POIs are located to the right of the visualizationand the coordinate follows the usual Java 2D coordinate sys-tem where a top-leftmost point has (x = 0, y = 0), big-ger values in this table mean the centroids are located closerto the user models. It is clearly observed that the relevantdocument cluster centroids were formed closer to the usermodels (458.03 versus 379.55) than the non-relevant docu-ment clusters (Wilcoxon rank sum test, p < 0.001). More-over, NE-based user models (VIBE+NE) show bigger sepa-ration (around 100 pixels) than keyword-based user models(VIBE, around 69 pixels) and the relevant document clusterin VIBE+NE is even closer to the user models than VIBE(492.58 versus 414.43). The difference is statistically signif-icant (Kruskal-Wallis rank sum test, p < 0.001). This resultconfirms the aforementioned simulation result [3] and sug-gests that VIBE+NE hasstronger relevant document discrim-ination power than baseline VIBE. Table 3 provides anothermetric regarding the document cluster quality. The Davies-Bouldin Validity Index (DB-index) [13] measures thecluster-ing quality by comparing the within document-centroid dis-tances versus between-cluster centroid distances. A smaller

Table 3: Comparison of DB-index between systems

System VIBE VIBE+NE pOverall 1.70 1.69 < 0.001

Figure 4: Movement of top-10 documents through searchprogress: compared by system (VIBE vs. VIBE+NE)

Figure 5: Movement of top-10 documents through searchprogress: compared by topic

DB-index value means a better clustering result. Overall,VIBE+NE shows significantly better DB-index even thoughthe difference is small (Kruskal-Wallis rank sum test, p <0.001).

Position of system recommended documents. Inaddition tothe ability that can guide users to relevant documents closerto visual user models, Adaptive VIBE estimates documentrelevance by combining query-document and user model-document similarity scores. The top-most highly scored doc-uments are painted in blue (see Figure 1), so that users caneasily identify system-recommended documents. It is im-

X-coordinates of relevant/non-relevant document cluster centroids

DB-index comparison between VIBE and VIBE+NE (DB-index smaller, better clustering)

position of system recommended docs

open document precision

Open document precision

Opened relevant document count

user note precision

User note precision

UM manipulation vs. performance

0.00

0.25

0.50

0.75

1.00

0 10 20 30


CD

F

System

VIBE

VIBE+NE

Figure 6: Comparison of relevant document open count

closer to moresimilar POIs in AdaptiveVIBE and their posi-tions are updated dynamically while users drag related POIs.This dynamic visualization feature can let users manipulatethelayout of theuser model POIsand instantly learn theeffectof the manipulation on the retrieved documents. Therefore,we compared the participants’ POI movement (or POI drag-ging) event counts with system and user precision. Systemprecision wascalculated astheprecision of top-10 documentsand user precision was calculated as the precision of useropened documents. Figure 7 shows the correlation betweenPOI manipulation counts and system/user precision. The re-gression lines suggest no statistical evidence that POI manip-ulation degraded system or user performance. Figure 8 and 9break down theoverall correlations into keyword POI and NEPOI correlations respectively. Among them only keyword-user precision shows significantly negative result (Figure 8(below), p = 0.0179). However the system precision (Fig-ure8 (above)) still doesnot show any significant degradation.It suggests the system could maintain high performance re-gardless of user POI manipulation but the users eventuallymade wrong decisions. In fact the R-square score is rela-tively low (R2 = 0.179) and the graph shows that a few out-liers resulted in negative results. Moreover NE POI manip-ulations show no performance degradation (Figure 9), whichhints at the advantages of semantic named-entities during theuser manipulation of visual user model elements. Theseanal-ysessuggest that thevisualization-based open user model ma-nipulation could overcome the disadvantage of the text-basedopen and editable user modeling, which was observed in theprevious studies.

Table 7: Subjective feedback: positive reactions

System VIBE VIBE+NEAverage Score 3.18 3.39

SD 0.98 0.93Positive count 4 9

Figure 7: POI movement versus performance (all POIs)

Table 8: Subjective feedback: topic difficulty

Topic 40009 40021 40048Mean Topic Difficulty 2.77 3.27 2.68

Subjective Feedback

We asked the participants to rate VIBE and VIBE+NE in the5-point Likert scale (1=dislike, 5=like). The average differ-ence was close to significant between the two systems (Ta-ble 7, Kruskal-Wallis rank sum test, p = 0.053). Table 7also compares relative positive reactions, showing how manysubjects preferred VIBE or VIBE+NE. More participants (9)preferred VIBE+NE to VIBE (4) when they were not tied(20). Table 8 compares topic difficulty scores returned bythe participants (1=easy, 5=difficult). Despite our attempt tomake the three topic difficulties equivalent, the participantsperceived differences. The most difficult topic was 40021and the easiest one was 40048. This result was statisticallysignificant (Kruskal Wallis rank sum test, p = 0.032).

0.00

0.25

0.50

0.75

1.00

0 10 20 30


CD

F

System

VIBE

VIBE+NE


closer to moresimilar POIs in AdaptiveVIBE and their posi-tions are updated dynamically while users drag related POIs.This dynamic visualization feature can let users manipulatethelayout of theuser model POIsand instantly learn theeffectof the manipulation on the retrieved documents. Therefore,we compared the participants’ POI movement (or POI drag-ging) event counts with system and user precision. Systemprecision wascalculated astheprecision of top-10 documentsand user precision was calculated as the precision of useropened documents. Figure 7 shows the correlation betweenPOI manipulation counts and system/user precision. The re-gression lines suggest no statistical evidence that POI manip-ulation degraded system or user performance. Figure 8 and 9break down theoverall correlations into keyword POI and NEPOI correlations respectively. Among them only keyword-user precision shows significantly negative result (Figure 8(below), p = 0.0179). However the system precision (Fig-ure8 (above)) still doesnot show any significant degradation.It suggests the system could maintain high performance re-gardless of user POI manipulation but the users eventuallymade wrong decisions. In fact the R-square score is rela-tively low (R2 = 0.179) and the graph shows that a few out-liers resulted in negative results. Moreover NE POI manip-ulations show no performance degradation (Figure 9), whichhints at the advantages of semantic named-entities during theuser manipulation of visual user model elements. Theseanal-ysessuggest that thevisualization-based open user model ma-nipulation could overcome thedisadvantage of the text-basedopen and editable user modeling, which was observed in theprevious studies.



SD 0.98 0.93Positive count 4 9




Subjective Feedback

We asked the participants to rate VIBE and VIBE+NE in the5-point Likert scale (1=dislike, 5=like). The average differ-ence was close to significant between the two systems (Ta-ble 7, Kruskal-Wallis rank sum test, p = 0.053). Table 7also compares relativepositive reactions, showing how manysubjects preferred VIBE or VIBE+NE. More participants (9)preferred VIBE+NE to VIBE (4) when they were not tied(20). Table 8 compares topic difficulty scores returned bythe participants (1=easy, 5=difficult). Despite our attempt tomake the three topic difficulties equivalent, the participantsperceived differences. The most difficult topic was 40021and the easiest one was 40048. This result was statisticallysignificant (Kruskal Wallis rank sum test, p = 0.032).

position of system recommended docs

0.00

0.25

0.50

0.75

1.00

0 10 20 30


CD

F

System

VIBE

VIBE+NE


closer to moresimilar POIs in AdaptiveVIBE and their posi-tions are updated dynamically while users drag related POIs.This dynamic visualization feature can let users manipulatethelayout of theuser model POIsand instantly learn theeffectof the manipulation on the retrieved documents. Therefore,we compared the participants’ POI movement (or POI drag-ging) event counts with system and user precision. Systemprecision wascalculated astheprecision of top-10 documentsand user precision was calculated as the precision of useropened documents. Figure 7 shows the correlation betweenPOI manipulation counts and system/user precision. The re-gression lines suggest no statistical evidence that POI manip-ulation degraded system or user performance. Figure 8 and 9break down theoverall correlations into keyword POI and NEPOI correlations respectively. Among them only keyword-user precision shows significantly negative result (Figure 8(below), p = 0.0179). However the system precision (Fig-ure8 (above)) still doesnot show any significant degradation.It suggests the system could maintain high performance re-gardless of user POI manipulation but the users eventuallymade wrong decisions. In fact the R-square score is rela-tively low (R2 = 0.179) and the graph shows that a few out-liers resulted in negative results. Moreover NE POI manip-ulations show no performance degradation (Figure 9), whichhints at the advantages of semantic named-entities during theuser manipulation of visual user model elements. Theseanal-ysessuggest that thevisualization-based open user model ma-nipulation could overcome thedisadvantage of the text-basedopen and editable user modeling, which was observed in theprevious studies.



SD 0.98 0.93Positivecount 4 9




Subjective Feedback


0.00

0.25

0.50

0.75

1.00

0 10 20 30


CD

F

System

VIBE

VIBE+NE


closer to moresimilar POIs in AdaptiveVIBE and their posi-tions are updated dynamically while users drag related POIs.This dynamic visualization feature can let users manipulatethelayout of theuser model POIsand instantly learn theeffectof the manipulation on the retrieved documents. Therefore,we compared the participants’ POI movement (or POI drag-ging) event counts with system and user precision. Systemprecision wascalculated astheprecision of top-10 documentsand user precision was calculated as the precision of useropened documents. Figure 7 shows the correlation betweenPOI manipulation counts and system/user precision. The re-gression lines suggest no statistical evidence that POI manip-ulation degraded system or user performance. Figure 8 and 9break down theoverall correlations into keyword POI and NEPOI correlations respectively. Among them only keyword-user precision shows significantly negative result (Figure 8(below), p = 0.0179). However the system precision (Fig-ure8 (above)) still doesnot show any significant degradation.It suggests the system could maintain high performance re-gardless of user POI manipulation but the users eventuallymade wrong decisions. In fact the R-square score is rela-tively low (R2 = 0.179) and the graph shows that a few out-liers resulted in negative results. Moreover NE POI manip-ulations show no performance degradation (Figure 9), whichhints at theadvantages of semantic named-entities during theuser manipulation of visual user model elements. Theseanal-ysessuggest that thevisualization-based open user model ma-nipulation could overcome thedisadvantageof thetext-basedopen and editable user modeling, which was observed in theprevious studies.



SD 0.98 0.93Positivecount 4 9




Subjective Feedback


User preference on two systems (1=dislike, 5=like)

Topic difficulty (subjective)

Date post:	16-Jul-2015
Category:	Data & Analytics
Upload:	peter-brusilovsky
View:	447 times
Download:	0 times

Iui2015: Personalized Search: Reconsidering the Value of Open User Models

Data & Analytics