+ All Categories
Home > Documents > Differences Discussed in Class

Differences Discussed in Class

Date post: 06-Nov-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
6
Query Formulation and Reformulation: Challenges and Possible Solutions INSC 540 Final Project Instructor: Dr. Wanda Pratt Presenter: Sabrina Hsueh Outline Points taken from class discussion/reading The iterative information access process The challenge The gap How to bring query closer to user needs Possible solutions to the challenges Lesson learned and links to our project Differences Discussed in Class: Web IR v. OPAC Content Multi-language Uncontrolled (source not differentiated) Unverified Multimedia Ever-growing Users Unmediated fashion Less expertise in search More inclined to browsing Diverse backgrounds/goals Query Shorter Higher expectation on the first result Matching user needs with information sources is an Iterative Process. So is query formulation and reformulation. User Info Needs Search Engine Query Result Reformulated Query Info Sources Result for Re-Q Content User Query How to bring query closer to user info needs? Challenges for Users How to specify info needs Unfamiliar to Underlying system operations • Vocabulary systems used to describe information sources in database E.g. Yahoo’s Category scheme E.g. Term inconsistency Dungarees Jeans Dr. Karen Fisher Dr. Karen Pettigrew The nature of database E.g. Google web, images, groups, directory, news search Search Engine Challenges for Search Engine Across users Diverse user tasks Diverse user characteristics Diverse user motivations Single user, across sessions Diverse user tasks Single user, within a session Dynamic search process Vocabulary problem Search Engine
Transcript

1

Query Formulation and Reformulation: Challenges and Possible Solutions

INSC 540 Final ProjectInstructor: Dr. Wanda PrattPresenter: Sabrina Hsueh

OutlinePoints taken from class discussion/readingThe iterative information access processThe challenge

The gapHow to bring query closer to user needs

Possible solutions to the challengesLesson learned and links to our project

Differences Discussed in Class:Web IR v. OPAC

ContentMulti-languageUncontrolled (source not differentiated)UnverifiedMultimediaEver-growing

UsersUnmediated fashionLess expertise in searchMore inclined to browsingDiverse backgrounds/goals

QueryShorterHigher expectation on the

first result

Matching user needs with information sources is an Iterative Process. So is query formulation and reformulation.

User Info Needs

Search Engine

Query Result

Reformulated Query

Info Sources

Result for Re-Q

ContentUser Query

How to bring query closer to user info needs?

Challenges for UsersHow to specify info needs

Unfamiliar to• Underlying system operations• Vocabulary systems used to describe information sources

in database– E.g. Yahoo’s Category scheme– E.g. Term inconsistency

– Dungarees Jeans – Dr. Karen Fisher Dr. Karen Pettigrew

• The nature of database– E.g. Google web, images, groups, directory, news search

Search Engine

Challenges for Search EngineAcross users

Diverse user tasksDiverse user characteristicsDiverse user motivations

Single user, across sessionsDiverse user tasks

Single user, within a sessionDynamic search processVocabulary problem

Search Engine

2

To Understand Web IA Users and Queries Better By…

1. Characterizing Users’ Web IA Model1.1 User behavior modeling (in general)1.2 The nature of iterative query reformulation process (specific)

2. Characterizing Web Query2.1 Statistics of Web IR Logs

1. Characterizing Users’ Web IA Model

1.1 User behavior modeling (in general)Berry-picking model, integrated model of info seeking and searching (Bates, M.J. 1989, 2002)Behavioral taxonomy of info seeking on the Web (Choo, et al., 1999 @ U. of Toronto)Web surfing patterns and regularity (Huberman, B.A., et al., 1998 @ HP)Focus on knowledge workers (Sellen, A.R. et al., 2002)

1. Characterizing Users’ Web IA Model

1.2 The nature of iterative query reformulation process (specific)

Evaluations of relevance feedback process and mechanisms (Belkin, N.J., et al., 1995- present @ Rutgers)Empirical data for query reformulation (Bruza, P.D. and Dennis, S., 1997)User-based evaluation of query expansion (Efthimiadis, E., 1996, 2000)

2. Characterizing Web Query

2.1 Statistics of Web IR LogsSilverstein, C., et al. (1998) (AltaVista)Jansen, B.J., et al. (2000) (Excite)Anick, P. (2003) (AltaVista)

How if some aiding devices are provided in the middle to help…

(both users and systems to) formulate better query for users’ info needs(users to) use the system’s resources more effectively

Search Engine

Possible Solutions to the Problem Automatic/Interactive query reformulationAutomatic web usage miningAutomatic search context modelingInteractive term suggestions

3

Possible Solutions to the Problem (1)Automatic/Interactive query reformulation:

User preferred more understanding and control over Relevance feedback and ranking

• Belkin, et al. (1996)Almost completely inexperienced users can choose terms offered

• Koeneman, J. (1996) A combination of automatic/interactive, explicit/implicit use of thesaurus, relevance judgments on DOCs/TERMs

• Beaulieu, M. (1997)

Search Engine Core

Search Engine

System Query Reformulation

User Interface for Reformulation

Possible Solutions to the Problem (2)Automatic web usage mining

Bayesian user goal modeling• The Lumiere Project (Horvitz, E., et al., 1998)

Learning which search engine to select• Savvy Search (Howe, A. and Dreillinger, 1997)

Modeling multitasking users• Slaney, M. (2003)

Modeling web user’s interests by pages viewed• Zhu, T., et al. (2003)

Possible Solutions to the Problem (3)Automatic search context modeling

Machine learning approach• Chen, H. (2003): concept map• Lawrence, S. (NEC)(2000): Information extraction, domain-

specific processing, community identification, specialized search engine location, etc.

Displaying the hierarchical relationship of the results• Chacha (UC. Berkeley)(1999)• Dumais, S., et al. (Microsoft Research)(2001)

Possible Solutions to the Problem (4)Interactive term suggestions

Relevance feedback & local context analysis• Belkin, N.J., et al. (Rutgers) (2000)

Log-based• Huang, C., Chien, L., et al. (IIS, Taiwan) (2003)• Anick, P. (2003)

ΕvaluationBruza, P.D., et al. (2000)

Compare query reformulation search with query-based and directory-based search

• Hyperindex browser v. Google v. YahooQuery reformulation significantly improve the relevance of the documents at the cost of increased search time and cognitive load

Belkin, et al. (2000)Explicit term suggestion > automatic query reformulation

• Increased knowledge of how SE worked• Increased control by the user of its suggestions

Design suggestion:• Terms suggested should be related to the search context.• With sufficient reason to trust the system recommendations,

users are willing to give up some measure of control.

Lessons Learned (1)The challenge:

The gap between user query and information needs has been challenges to both users and search engine system designersVocabulary problem is serious on both sides too

Possible solution:Past researches have drawn on both automatic and interactive approaches to tackle challenges in the iterative information access

• Automatic/Interactive query reformulation• Automatic web usage mining• Automatic search context modeling• Interactive term suggestion

4

Lessons Learned (2)Possible solution (continued)

Interactive query reformulation is valuable for bringing user queries closer to information needs.

• People with little experience are able to learn the features quite effectively (with relatively little training).

• Users prefer more knowledge about what happened and more control over just what is actually done.

The effectiveness of Log-based term suggestionhas not been fully explored yet.

Links to Our ProjectIssues:

How much we can learn from log analysis to bridge the gap between user query and their information needs?How possible to build any term suggestion aiding device based on log-only?

• How to identify “best bets” associated with a certain term?• How to identify the most popular pages associated with a

certain term?• How many suggested terms are enough and at what level?

Term Suggestion: Used ForA long list of possible terms to suggest

SynonymHomonymCommon misspellingChanges in contextMarried name to maiden nameAbbreviations, etc.

Mechanism:Synonym ringsAuthority files

• Preferred terms

Term Suggestion: RelationalAlso a list of possible terms to suggest

Pure hierarchical relationship• Narrower terms• Broader terms

Associative relationship• Explicit: E.g., Denim Jeans• Implied: E.g., “Hemmingway wore khakis” “Rock

Hudson’s subtle use of Khakis that made ‘A Farewell to Arms’ a great movie”

Mechanism:Taxonomy purely hierarchicalThesauri also associative

References: User Behavior Modeling1. Bates, M.J. (1989). The Design of Browsing and Berry picking

Techniques for the On-line Search Interface. Online Review, 13(5): 407-431.

2. Bate, M.J. (2002). Towards an Integrated Model of Information Seeking and Searching. Keynote. The 4th Conference on Information Needs, Seeking and Use in Different Contexts, Lisbon, Portugal, Sep 11-13, 2002.

3. Choo, C. W., B. Detlor, et al. (1999). Information seeking on the Web: an integrated model of browsing and searching. Proceedings of the ASIS Annual Meeting, Washington, D.C.

4. O’Day, V.L., Jeffries, R. (1993). Orienteering in an Information Landscape: How Information Seekers Get From Here to There. In Proc. of the INTERCHI ’93, Amsterdam, Netherlands, April 1993.

5. Tauscher, L. and Greenberg, S. (1997). How People Revisit Web Pages: Empirical Findings and Implications for the Design of History Systems. International Journal of Human-Computer Studies, 47: pp. 97-137.

6. Sellen, A., R. Murphy, et al. (2002). How Knowledge Workers Use the Web. CHI 02: ACM SIGCHI Conference on Human Factors in Computing Systems Minneapolis MN ACM SIGCHI

References: Log Analysis 1. Anick, P.G. (1994). Adapting a Full-text Information Retrieval

Systems to the Computer Troubleshooting Domain. In Proc. of the 13th

Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 349-359

2. Anick, P. (2003). Using Terminological Feedback for Web Search Refinement - A Log-based Study. In Proceedings of the 26th annual international ACM SIGIR conference on Research and Development in Information retrieval, 2003.

3. Jansen, B.J., Spink, A., Saracevic, T. (2000). Real Life, Real Users and Real Needs: a Study and Analysis of User Queries on the Web.Information Processing and Management 36:207-227.

4. Silverstein, C., et al. (1998). Analysis of a Very Large AltaVista Query Log. Technical Report 1998-014, Digital Systems Research Center.

5

References: Iterative IA Process Analysis

1. Belkin, N.J., (2000). Helping People Find What They Don't Know. Communications of ACM, 43(8): 58-61.

2. Bruza, P.D. and Dennis, S. (1997). Query Reformulation on the Internet: Empirical Data and the Hyperindex Search Engine. In RIAO'97, pp. 488-499, 1997.

3. Efthimiadis, E. (1996). Query Expansion. Annual Review of Information Science Technology, 31:121-187.

4. Wen, J.-R., et al. (2001). Clustering User Queries of a Search Engine. In Proceedings of the 10th International World Wide Web Conference '01:162-168.

References: Term Suggestion1. Anick, P. (2003). Using Terminological Feedback for Web Search

Refinement - A Log-based Study. In Proceedings of the 26th annual international ACM SIGIR conference on Research and Development in Information retrieval, 2003.

2. Belkin, N. J., Cool, C., Head, J., Jeng, J., Kelly, D., Lin, S., Park, S. Y., Savage-Knepshield, P., & Sikora, C. (2000). Relevance feedback versus local context analysis as term suggestion devices: Rutgers' TREC-8 interactive track experience. In D. Harman, & E. Voorhees (Eds.), TREC-8: Proceedings of the Eighth Text Retrieval Conference. Washington, D. C.: NIST, 565-574.

3. Huang, C., Chien, L., Oyang, Y. (2003). Relevant Term Suggestion in Interactive Web Search Based on Contextual Information in Query Session Logs. Journal of the American Society for Information Science and Technology, 54(7):638-649.

References: Web Usage Mining1. Belkin, N.J., Marchetti, P.G., Cool, C. (1993). Braque – Design of an

Interface to Support User Interaction in Information Retrieval. Information Processing and Management, 29(3):325-244.

2. Horvitz, E., Breese, J., Heckerman, D., Hovel, D. and Rommelse, K. (1998). The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users. In Proc. of AAAI 98, Madison, WI, July 1998.

3. Howe, A. and Dreillinger, D. (1997). Savvy search: A Metasearch Engine that Learns Which Search Engines to Query. AI Magazine, 18(2): 19-25, 1997.

4. Mobasher, B., Cooley, R., Srivastava, J. (2000). Automatic Personalization Based on Web Usage Mining. Communications of the ACM, 43(8):143-151.

5. Slaney, M., Subrahmonia, J., Maglio, P. (2003). Modeling Multitasking Users. In Lecture Notes in Computer Science, v. 2702:188-197, Springer-VerlagHeidelberg, Jan 2003.

6. Zhu, T., Greiner, R., Haubl, G. (2003). Learning a Model of Web User’s Interests. In Lecture Notes in Computer Science, v. 2702:65-75, Springer-Verlag Heidelberg, Jan 2003.

References: Automatic/Interactive Query Reformulation

1. Beaulieu, M. (1997). Experiments on interfaces to support query expansion. Journal of Documentation, 53(1): 8--19, 1997.

2. Belkin, N.J., Cool, C., Koenemann, J., Ng., K.B., Park, S. (1996). Using Relevance Feedback and Ranking in Interactive Searching. In Harman, D. (ed.) TREC-4 Proceedings of Fourth Text Retrieval Conference, Washington, D.C., 181-209.

3. Harman, D. (1988). Towards Interactive Query Expansion. In Chiaramella, Y. (ed.) 11th International Conference on Research and Development in Information Retrieval, pp. 321 331, Grenoble, France, 1988.

4. Koenemann, J. (1996). Supporting interactive information retrieval through relevance feedback. In Proceeding of ACM SIGCHI, 1996.

References: Context Modeling1. Chen, H. (2003). Introduction to the JASIST Special Topic Section

on Web Retrieval and Mining: A Machine Learning Perspective. Journal of the American Society for Information Science and Technology, 54(7):621-624.

2. Chen, M., Hearst, M., Hong, J., and Lin, J. (1999). Cha-Cha: A System for Organizing Intranet Search Results. In the Proceedings of the 2nd USENIX Symposium on Internet Technologies and SYSTEMS (USITS), Boulder, CO, October 11-14, 1999

3. Dumais, S., Cutrell, E., and Chen, H. (2001). Optimizing search by showing results in context. In Proceedings of CHI 2001.

4. Lawrence, S. (2000). Context in Web Search. IEEE Data Engineering Bulletin, 23(3):25-32.

References: Evaluation1. Bruza, P.D., McArthur, R. and Dennis, S. (2000). Interactive

Internet Search: Keyword, directory and query reformulation mechanisms compared. In Proceedings of the 23 rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

2. Efthimis N. Efthimiadis. (2000). Interactive Query Expansion: A User-based Evaluation in a Relevance Feedback Environment. JASIS51(11): 989-1003.

6

Thank youAny questions?


Recommended