+ All Categories
Home > Documents > HITIQA: High-Quality Interactive...

HITIQA: High-Quality Interactive...

Date post: 17-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
43
HITIQA: High-Quality Interactive Question- Answering A proposal submitted to: Office of Advanced Analytic Tools 1760 Business Center Drive Reston, VA 20190 Submitted by: University at Albany, State University of New York Rutgers University Contact Address University at Albany, SUNY Office of Sponsored Research 1700 Western Avenue
Transcript
Page 1: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

HITIQA: High-Quality Interactive Question-Answering

A proposal submitted to:

Office of Advanced Analytic Tools1760 Business Center DriveReston, VA 20190

Submitted by:

University at Albany, State University of New YorkRutgers University

Contact AddressUniversity at Albany, SUNYOffice of Sponsored Research1700 Western AvenueAlbany, New York 12222

Page 2: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

Table of ContentsTable of Contents.................................................................................................................21. Executive Summary.........................................................................................................32. Project Objectives & Innovative Claims.........................................................................43. Technical Rationale and Approach..................................................................................6

3.1. Question Semantics and Classification.....................................................................73.2. Clarification Dialogue..............................................................................................93.3. Information Quality Metrics...................................................................................113.4. Data Fusion.............................................................................................................143.5. User Models and Interaction Patterns.....................................................................15

3.5.1. Experimental Setup..........................................................................................163.5.2. Learning User Models and Quality Metrics....................................................17

3.6. Information Visualization and Navigation (RE).....................................................183.7. Related Research....................................................................................................18References......................................................................................................................18

4. Statement of Work.........................................................................................................19Task 1. Information Quality Metrics............................................................................19

1.1. Quality Parameters..............................................................................................191.2. Web-based tools..................................................................................................191.3. User experiments................................................................................................191.4. User Models........................................................................................................19

Task 2. Interactive Question Answering.......................................................................192.1. Question Analysis and Classification.................................................................192.2. Semantic Tools for text processing.....................................................................192.3. Integrating Clarification Dialogue......................................................................192.4. Answer Generation.............................................................................................19

Task 3. Multimedia Dialogue........................................................................................203.1. Quantitative text processing...............................................................................203.2. Dialogue Management........................................................................................203.3. Text and Graphics input/output..........................................................................20

Task 4. Information Visualization.................................................................................20Task 5. Data Fusion......................................................................................................20

5.1. Fusion parameters...............................................................................................205.2. Automatic Adjustment of Fusion parameters....................................................20

Task 6. Evaluation and User Studies.............................................................................206.1. Formative Evaluation..........................................................................................20

Task 7. Project Management.........................................................................................205. Schedule and Milestones...............................................................................................206. Personnel Qualifications................................................................................................22

Resumes.........................................................................................................................237. Facilities (ALL).............................................................................................................24

Page 3: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

1. Executive SummaryHigh-Quality Interactive Question-Answering (HITIQA) technology will allow users of information systems to pose questions in natural language (written or spoken) and obtain relevant, factual answers, or assistance they require in order to perform their tasks. For example, the question “How long does it take to fly from New York to Paris on a Con-corde?” would be expected to generate the answer of “3.5 hours.” Similarly, a request “I need to install a Windows component” would produce a series of appropriate instructions, while “What was Russia’s reaction to U.S. bombing of Kosovo?” would have a compre-hensive report prepared on the issue. These exchanges will not happen in isolation; in most cases the system would engage the users in a dialogue to clarify their intentions and goals, while navigating visually through multidimensional information space. The infor-mation necessary to answer users requests may be available to the system (which is by no means guaranteed), but its exact format is not known a priori: it could be a database record, a short text passage, or it could be scattered among many documents; it could be stated explicitly or it may have to be inferred. The users, of course, could find the an-swers they require by searching the available data using other access means, e.g., using a document retrieval system or database search with structured queries; however, none of these would quite match the convenience and directness of HITIQA. In HITIQA, infor-mation delivered to the user is not only relevant, but it’s useful and tailored to the tasks they are performing. Moreover, the information is of the highest quality possible, relative to the user task and needs: it is as timely, reliable, trustworthy, and accurate as it can be with a degree of confidence attached.

Page 4: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

This project aims to make significant advances in the state of the art of automated ques-tion answering by focusing on the following key research issues:

Question Semantics: how the system “understands” user requests Human-Computer Dialogue: how the user and the system negotiate this under-

standing Information Quality Metrics, or why some information is better than other Information Fusion: how to assemble the answer that fits user needs.

The project will involve several cycles of experiments with users performing a variety of tasks. Empirical data gathered from these experiments will be analyzed to induce models for automated assessment of information quality and for optimizing information fusion. These models will be embedded in the evolving HITIQA prototype. The concept is illus-trated in Figure 1.

2. Project Objectives & Innovative ClaimsThe central objective of HITIQA project is to develop Question-Answering technology suitable for the operational use by the professional analysts (see Box 1). We anticipate that this technology will also have significant impact on how casual users find their infor-mation on the Internet and elsewhere. The distinguishing property of HITIQA technology is that it will deliver an explicit and direct answer to the user, producing results that are immediately usable for whatever task may be at hand. This is most clearly contrasted with Information Retrieval (IR) technology that delivers ranked lists of document “hits” which are only likely to contain elements of an answer. Existing Q&A-like systems and services, e.g., Internet-based Ask Jeeves, don't so much answer questions, as direct the user to the information that has previously been determined to contain the answers. For most information needs, keyword-based document search and its variants are the only technology available today to analysts and Internet users alike.HITIQA should be also contrasted with the classical Q&A technology as proposed by Lehnert and others in the early 1980’s. Where Q&A systems were built for small, con-strained domains using carefully structured knowledge bases, HITIQA will offer open-domain technology, which can work with practically unlimited and unstructured data-bases. What sets HITIQA apart from the early Q&A work, and also enables this radical increase in scale, is the innovative approach to semantics of both questions and answers, including the use of clarification dialogue.

BOX 1. HITIQA RESEARCH OBJECTIVES High-quality responses to analyst information requests tailored to tasks. Robust, domain-independent semantics for questions and answers. Data-driven human-computer dialogue to negotiate request semantics Metrics for ranking information by quality and relevance (Q&R) Novel information fusion techniques to assemble high-quality responses Information visualization to support visual navigation and dialogue.

Page 5: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

In order to accomplish these objectives this project will deliver significant innovations as follows:

We propose to develop and evaluate a new model for natural language question semantics based on sub-categorization of relevant information available to the system. This will be accomplished by filtering multiple information sources that are likely to contain elements of the answer and performing multi-faceted catego-rization to expose various dimensions of the potential answer space, including themes, aspects, and other significant groupings.

We propose to embed in HITIQA an innovative data-driven dialogue management system. The central role of the dialogue component will be to negotiate with the users the desired and exact meaning of the questions they pose. This is necessary to narrow the gap between the user expectations (and therefore the semantics they ascribe to their questions) and the system “understanding” or “misunderstanding” of these questions.

We propose to develop, using machine learning techniques applied to empirical data, an extended model for classifying information by quality, in addition to and as an extension of the traditional notion of relevance. The quality metrics will in-clude criteria that information analysts consider essential in their work: useful-ness, reliability, trustworthiness, identification of bias, un-ambiguity, etc. We will conduct series of experiments with users performing analyst-type tasks, while our system monitors their actions and responses. These data will be converted into models and metrics for automated assessment of information quality in text docu-ments, using machine learning techniques which have been successful in other language processing applications, e.g., text tagging and topic detection.

We propose to develop innovative data fusion techniques and apply these to rank-ing and combining information from multiple sources and using multiple quality and relevance (Q&R) criteria. The objective is to build non-parametric models of fusion based on empirical evidence collected from users performing fusion tasks. The system will support fusion of multiple retrieval schemes, and other methods for estimating document and passage relevancy, to achieve better retrieval perfor-mance than any one scheme can provide.

We propose to integrate into HITIQA information visualization capabilities that will support visual-level communication and dialogue with the user. Visual dis-plays will capture the system’s interpretation of the potential answer space and show how it changes as the question-answering process progresses towards its resolution. In combination with interaction and steering capabilities the user’s comprehension of system parameters and effectiveness with the environment will be greatly improved. The available informational sources will ultimately be more fully explored both in the number of documents analyzed and the depth to which they are analyzed.

The proposed system will cooperate with an analyst to improve several critical issues in Question Answering:

The system will analyze the meaning of analyst questions in context of available information, user profiles and task characteristics.

The system will classify questions into categories in order to support answering certain types of questions by analogy.

Page 6: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

The system will engage the analyst in a dialogue, using both linguistic as well as visual means, in order to clarify the meaning and intent of the question.

The system will solicit explicit estimation of source and document quality from the users, permitting multi-dimensional characterization of quality.

The system will solicit explicit indication of the factors influencing estimation of source and document quality, supporting automated characterization of quality, based on those factors.

The system will learn the user’s preferences for quality indicators and for data fu-sion, and adapt to present those documents meeting the user’s explicit and im-plicit preferences.

The system will provide an interactive interface that lets the user teach the system about her preferences by dragging significant documents closer to the top ranking, in a two dimensional array.

The system will provide visual assistance in exploring and analyzing document characteristics.

The system will incorporate steering and interaction for knowledge based direc-tion of execution, improving efficiency and accuracy in exploration and analysis.

The system will support modular replacement of its components, so that various algorithms, display schemes etc. can be tested in efficient Taguchi-design experi-ments.

3. Technical Rationale and ApproachHITIQA grows out of several fields of information and language processing research (Box 2), including classical question-answering (Q&A), information retrieval and sum-marization, human-computer dialogue, information fusion, information visualization and information quality research. Until now, classical Q&A has been a laboratory-bound technology, used to study advanced linguistic properties and to build small-scale, limited domain demonstration systems (Lehnert, 1982). On the other hand, Information Retrieval (IR) research, which also includes text categorization, summarization and topic detection, has been quite successful at producing general-purpose automated systems as well as commercial applications. A typical IR system does not, of course, provide answers to questions; rather it returns a list of documents or pointers to documents where relevant in-formation is likely to be found.

BOX 2: HITIQA COMPONENTS Question-Answering

o Accept user questions and problem descriptions as input.o Rank responses by quality metrics relative to user’s task.

Information Quality Metrics and Evaluationo Collect empirical data through experiments with users performing real tasks.o Derive models for information quality assessment, based on empirical evidence.

Interactive Dialogue & Visual Navigationo Assist the users in formulating their information needs and goalso Negotiate user and system understanding of questions

Information Fusion and Summarizationo Summarize, combine and fuse information from multiple sourceso Learn user models and interaction patterns to provide optimal results.

Page 7: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

Question-answering emerged as a business application in recent years, partially as a re-sponse to the dismal precision of Internet search and other information onslaught devices. This need for increased accuracy and utility of information search and retrieval has been most acutely felt by professional analysts and, in somewhat different ways by Internet users, who need to cope with ever-growing volumes of data. Building appropriate appli-cations, however, requires Q&A technology of a significantly more advanced kind than what the classical research was aiming at. Indeed, there is a whole spectrum of questions that real users can ask, many of which require a system to perform a great deal of work before an acceptable answer can be assembled. For example, comparative questions such as “What is the fastest route from Albany to Ithaca?” may require information from sev-eral sources to be compiled. Professional analysts and news reporters will require “an-swers” to problem statements, which may take the form of a comprehensive report on a subject, assembled from many sources and documents, e.g., “What recent disasters oc-curred in tunnels used for transportation?”

In order to provide answers to such questions or requests, an automated system must have access to nearly unlimited amounts of information, in many different formats: text docu-ments, speech recordings, video images, web pages, etc. This information needs to be pre-filtered to pull out what may be relevant to the task, a step that can be accomplished using standard information retrieval techniques. The next step is to analyze the content of the relevant set in order to establish the initial “meaning” of the question. Furthermore, the system must be able assess the quality of the information it is dealing with and whether it meets users’ information needs as well as their quality criteria, which may vary from user to user and from task to task. A dialogue sub-system is required for the user and the system to negotiate their understanding of the question and modify as neces-sary. Moreover, visual representation and navigation through abstract information spaces is needed to facilitate this dialogue and to provide continuity of context. Finally, intelli-gent fusion techniques are necessary to identify the most valuable documents or passages (that is, to choose the most appropriate information) and to combine it into a high-quality answer. Below, we discuss details of each of the HITIQA components.

3.1. Question Semantics and ClassificationA successful question-answering system must be able to interpret and understand users’ requests, as well as their intentions and the context in which they operate. Moreover, this understanding must be independent of the question specifics or subject matter. To achieve this, we need to study questions semantics, and more broadly natural language semantics, a discipline that has been in recent years largely delegated to backbenches of Computational Linguistics. In HITIQA, we propose to explore a computationally tractable approach to question semantics, in which a question meaning is tied to the infor-mation that is considered relevant to it. Clearly, the ability to locate and identify what in-formation may be relevant to a question will differ in interesting ways between the hu-man and the machine. If a question were submitted as a query to an information retrieval system, then its semantics would be tied to the set of highly-ranked documents retrieved. This retrieved set may have relatively low precision according to human judgment; none-

Page 8: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

theless, its composition would reflect the variety of interpretations of the query that are possible for the system. We may be able to compute some of these interpretations by identifying themes or dimensions within the retrieved set, e.g. by clustering documents and passages around keywords or features such as named entities, word co-occurrences, and so forth. Therefore, the relevant set along with its sub-clusters and themes constitutes the system’s understanding of the question; it becomes the semantics of the question as seen by the system (Figure 2).

This interpretation of a question is likely to be different from what the user meant by it. Perhaps the question wasn’t precise or specific enough; or else the system simply lacks the ability to fully comprehend it. Yet, this obvious disconnect is not necessarily a prob-lem as long as the differences can be negotiated with the user. If the retrieved set contains multiple themes, it is likely that the user is not interested in all of them. The system may take initiative and ask a clarifying question in an attempt to reduce dimensionality of the

answer space. The clarifying question will be construed along some of the perceived di-mensions of the retrieved set, and it will serve two purposes: (1) to make the original question more specific, and (2) to solicit any additional cues from the user. It is possible of course that the clarifying question would miss the theme distinction that the user cares about:

USER: What recent disasters occurred in tunnels used for transportation?SYST: Are you interested in train accidents, automobile accidents, or others?USER: Any that involved lost life or major disruption in communication.

We can avoid irritating the user by phrasing the clarifying question in such a way as to induce a helpful response, i.e., its second purpose is achieved. This aspect will be handled by the Dialogue Component, which is discussed in the next section.

Page 9: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

As described above, the question semantics assumes nothing about the question syntax or format. In traditional Q&A questions are parsed to establish their focus, usually a phrase following which, what, where, etc., because the focus frequently determines the type of the answer. For example, the focus of “Which fashion designer decided that Michael Jackson should wear only one glove?” is “fashion designer” and thus an answer is ex-pected to be a HUMAN (Harabagiu, 2000). Syntactic focus is usually sufficient for sim-ple factual questions, with the exception of ambiguities created by some how and why questions, e.g., “Why did the U.S. sent Air Force to Kosovo?” could have focus on any of the three noun phrases, depending upon the context, background knowledge, stress, etc. In more complex questions and problem statements, explicit syntactic focus plays a lesser role, while the semantic focus may be implicit. Nonetheless, it may be desirable to cate-gorize questions into classes for which more specialized processing could be designed. For example, some questions may be answered by analogy to other already answered questions, e.g., “What was the reaction of <COUNTRY> to <EVENT>?” may require a similar style of answer whether the country is U.S. or Japan, or Germany. A preliminary question classification will be derived from the data obtained from experiments with uses as described in section 3.5. Further refinement will be added as the system is tested and used in an operational setting.

The field of question answering has been subject to intense research over the past two years as many experimental systems participated in the Q&A evaluation organized as part of the annual Text Retrieval Conference (TREC). The goals of this evaluation were thus far quite modest. For example, the evaluation guidelines explicitly limit the types of questions that can be asked and the type of answers that can be given. Specifically, only simple fact finding questions are permitted and only if they can be answered using a phrase or a similar word sequence, which is entirely contained in a single text document. For example, the question “What is the population of China?” could be answered in sev-eral different ways, depending upon the source, but it was not required that the systems supply a comprehensive response, something that an intelligence analyst may expect. The research proposed here moves above and beyond TREC-style question answering.

3.2. Clarification DialogueThe question-answering process is rarely a simple take-question-give-answer affair and may require the sides to exchange further information before the process can be com-pleted. In fact we argue that, with an exception of the simplest cases, a dialogue is re-quired to make sure that the user obtains service he or she requires. In information re-trieval, relevance feedback is a powerful mechanism in obtaining high-precision and high-recall results. Relevance feedback allows the users to review preliminary search re-sults and indicate these pieces that they find particularly relevant. The search query is subsequently modified to reflect these preferences (usually by adding or re-weighting some keywords) and the search is repeated. We may note that relevance feedback is a form of dialogue, albeit a very primitive one where the user has to make all the moves: ask the question, assess results, indicate how to reformulate the query. Because the dia-logue is so one-sided, the entire process is also a grossly inefficient trial-and-error.

Page 10: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

In question-answering, where user expectations of accuracy and quality of results are un-derstandably higher than in document retrieval, there may be little tolerance for answers that are off-target let alone not relevant. This may be illustrated by the following hypo-thetical Q&A session:

Q: How do I get from Bellagio to Venice?A: Drive south-east towards Lecco and Bergamo, then take autoroute A4 east to

Venice.Q: Umm… well… can I get there by train?

In this example, the initial question presumably retrieves a number of alternative direc-tions but only one of them, presumably the highest-scored one, is returned as an answer. In the absence of any contextual information that may justify certain preference, this an-swer choice is more or less random. A better way would be to subcategorize the retrieved

set into, say, driving directions, train directions, bus directions, etc., and then seek clarifi-cation either in the user profile or by asking the user a question. We can achieve this by adopting data-driven open-domain question semantics as proposed in the previous sec-tion.

Clarification dialogue is a critical component of question-answering process in HITIQA. The overall objective of the dialogue is to narrow the gap between the system and the user in their interpretation of the question (Figure 3). The dialogue is initiated by the sys-tem when the potential answer space, as defined by the initial retrieved set, can be parti-tioned or subcategorized along multiple dimensions. The number or the exact nature of these dimensions that may be required is unknown at this time, and will be investigated for different tasks and users. The sub-goals of the clarification dialogue are as follows:

Page 11: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

o Reduce dimensionality of the answer set: have the user select or reject some dimensions by examining the user profile or asking a question

o Fill in any gaps in the expected answer: ask the user to supply a missing piece of information necessary to calculate the answer (e.g., age, address, etc.)

o Resolve any ambiguities in the question: ask the user to provide additional data points whenever uncertain what to do next.

We have experience in designing a successful dialogue system based on similar princi-ples. One of us (Strzalkowski) has led the team that implemented an automated call cen-ter prototype Parts-Id for GE Medical Systems (Bagga et al., 2000). This system used spoken-language interface to interact with a database of replacement parts for medical equipment. The system has been designed to support GE technical personnel on field as-signments. Here is an actual dialogue recorded between Parts-Id system and a test user:

PARTS-ID: Welcome to the GE Parts ID Center. Please give me the part that you need the ID for?

CALLER: I would like the part number for a suction cup. [Ambiguous!]PARTS-ID: Which product do you need this part for?CALLER: 3Mlasercam [speech recognition challenge!]PARTS-ID: Is 3MLasercam the product name that you said?CALLER: YesPARTS-ID: The part ID for part SUCTION CUPS FOR FILM PICK UP used in

product 3MLASERCAM is 46-266975P10. Do you want to exit or go back to the main menu?

In Parts-Id the search is conducted over a database of records specifying part description, numbers and the equipment, complete with variants and versions, etc. The retrieved set consists of one or more of such records, or it could be empty. Whenever multiple records are retrieved, the system asks follow-up questions to determine the right equipment, or model, or version, etc. This is done dynamically, by partitioning the retrieved set along its multiple dimensions, and selecting an optimal dialogue strategy to reduce the set to just one member.

In HITIQA, the main challenge is to adapt this dialogue model to unstructured data. We believe that this is achievable using the data-driven definition of question semantics dis-cussed above.

3.3. Information Quality MetricsA conventional approach to evaluate information is by the standard of relevance. While relevance is mainly reflected through topicality, researchers have found that topical rele-vance, even though may be a necessary condition for information retrieval, is by no means a sufficient condition for document evaluation (Boyce, 1982; Schamber et al., 1990). The narrow concept of relevance is found to be too limiting to characterize com-plex information needs of an intelligence analyst, or even many Internet users. For exam-ple, a detailed technical manual may be “relevant” but would not be very useful to a prospective car buyer. Similarly, a paid commercial advertising a product or service is unlikely to be objective nor would it be expected to contain the complete information contrasting this to another competing product. Information quality assessment is particu-

Page 12: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

larly important when sensitive and potentially harmful information is handled, such as in-telligence data.

Information Quality is defined as the “totality of characteristics of an entity that bears on its ability to satisfy stated and implied needs” (ISO, 1994). With the current growth of the Internet, there is an increasing demand for delivering not only relevant information, but also high quality information. A request for information about the mosquito-carried West Nile virus in New York metropolitan area would be better served by a detailed report from, say cnn.com, than by an informal UseNet message. Both pieces of information may be equally relevant, but one better meets the user’s quality requirement. Quality concerns are especially crucial for an intelligence analyst, who faces the challenge daily of having to select and filter through multiple information sources, many of questionable veracity.

Previous research has investigated the issue of information quality from two perspectives: general quality criteria and measurable quality indicators (Table 1). The quality criteria are related to human perception of an information object, be that a text, a graphic file, a passage, or a multimedia presentation, and thus tend to be somewhat subjective. These criteria cover content quality, presentation quality, as well as authority and timeliness. In contract to these, measurable quality indicators include quantifiable document features such as the number of references or links to it, how often it is being accessed and by whom, etc. Such objective indicators are easy to compute, and sometimes they are predic-tors of intrinsic quality, but of course they alone cannot guarantee it. The challenge is to expand the set of quality indicators to include linguistic, stylistic, structural and collection properties of information and then link them to the human quality assessments in such as way that the latter can be computed and predicted with a degree of confidence.

Table 1 lists a selection of quality criteria and quality indicators that we believe to be es-sential for our project. The left column shows quality criteria, grouped into four classes: Content, Authority, Presentation, and Timeliness. Under the CONTENT category, “Ac-curacy and Objectivity” concerns (the degree of) correctness of a piece of information, and also whether its objectivity or bias can be assessed. On the other hand, “Verifiability” requires that certain facts be confirmed from independent sources. We expect that these criteria will be of interest to analysts who deal with information coming from a variety of sources. Similarly, the “Credibility” feature in the AUTHORITY category covers things like authors’ affiliations and credentials, the degree of standardizations of the review process, and the reputation of the publisher. In the PRESENTATION category, “Read-ability” of a document requires an assessment of how easy it is to follow and comprehend a given document, another important feature in analyst’s work.

Since most of the quality criteria are difficult to quantify, many researchers have turned to more objective, and measurable quality indicators (right column in Table 1). Previous studies have shown that link-based metrics serve as a very good predictor for the quality ratings of web documents (Amento et al., 2000). In Table 1, the link-based metrics in-clude the in-degree measure, the out-degree measure, a measure of credibility of the cit-ing/linking authors, as well as a measure of the document size. One way to estimate the quality of a document is to count the number citations or links then normalize with the

Page 13: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

document size and the credibility factor assigned to each citation or link, thus obtaining a numerical quality score. To these we can add certain linguistic, structural and stylistic features, which we found to be useful predictors of document relatedness in TREC re-trieval experiments. For example, stylistic features of text can help to determine if we are looking at a technical report, a news piece, or a newsgroup posting (Karlgren, 1998). Similarly, linguistic features such as references to people and places can help estimate credibility of a document. Furthermore, we may be able to estimate reliability of informa-tion by finding independent accounts in different documents. When all such features are considered, we may be able to assign to each document an N-dimensional quality vector.

We will conduct experiments with users and collect quantitative data on how they apply quality criteria to the information they work with. Given human assessments of quality and the external factors computed for the source documents, we will attempt to generate quality models based on quantifiable factors. This may require additional external factors to be considered, such as word sequences, or presence of certain concepts, etc. Once a suitable model is created, it will be possible to measure document quality automatically by assigning a quality vector to each information object. The quality vector, together with the relevance vector, will be used in data fusion for information display. Given user preferences and task requirements, documents can be ranked and fused along appropriate quality dimensions. A detailed description of the experimental set up for identifying valid internal quality criteria is given in Section 3.5.

TABLE 1. PRELIMINARY QUALITY INDICATORS

Quality Criteria Measurable Quality Indicators: traditional + proposed

CONTENTo Accuracy and Objectivityo Completenesso Uniqueness and Importanceo Verifiability

AUTHORITY○ Reliability○ Credibility

PRESENTATION○ Clarity and Un-ambiguity○ Style and Gravitas○ Orientation and Level○ Readability and Usability

TIMELINESSo Recencyo Currency

IN-DEGREE MEASUREo Number of cites or links to the documento Credibility of these cites/links

OUT-DEGREE MEASUREo Number of citations in the document

DOCUMENT SIZEo Size of the document in words/sections…

STYLISTIC FEATURESo Typical sentence lengtho Use of pronouns, punctuations, etc.

LINGUISTIC FEATURESo Choice of sentence forms, verbso References to names, amounts, …

STRUCTURAL FEATURESo Organization of sectionso Use of section titles, etc.

COLLECTION FEATURESo Confirmations and contradictions

Page 14: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

3.4. Data FusionData fusion is a relatively new concept. It is an approach which combines data, evidence, or decisions coming from (or based on) various sources, of different natures, about the same set of objects, in order to increase the quality of decision making under uncertainty about the object (Varshney, 1997). Generally there are three level of data fusion: primary data level, attribute level, and decision level (Kantor, 1994). On the primary level, all the information available to the detecting systems is considered together in the fusion process to make an overall estimate. On the attribute level, primary signals detected from the ob-jects by different detecting systems are processed into a set of specific attributes, and de-cisions about the objects are made according to an optimal decision rule based such at-tributes. On the decision level, each detecting system individually makes its own partial decision about the objects using its own data according to its own criteria, and a final de-cision is made based on these partial decisions. Empirically, data fusion in IR works well. Many data fusion experiments have been done (e.g., Bartell et 1994; Vogt et at, 1997, etc.). Researchers are beginning to explore the theoretically foundations of data fusion in IR (Lee, 1997; Ng and Kantor 1998, 2000). In this experiment, we will employ data fu-sion methods to combine evidence from different sources, about either the relevance or the quality of a document or passage.

In addition, we will use several different search engines (retrieval schemes) to estimate the relevance of documents to the query or sub-query. These will include: the latest re-search version of InQuery; the latest research version of SMART the latest research ver-sion of MgQuery, and the latest version of NIST's ZPRISE. These will be augmented by natural language techniques, developed by TS and colleagues. These engines, and the various settings at which they are operated, will produce a second vector, the vector of Relevance scores, R(d). (Figures 5 and 6)

The heart of the fusion-interface research problem is to convert these two vectors of scores into a new vector, called the display scores D(d), which control where the icon representing each document appears on the analysts screen, and how the variable features of that icon (size, shape color orientation, etc.) are linked to the characteristics of the doc-ument. The entire process can be described as the study of the relation between the char-acteristics, Q, R and the display, D, which we will represent as D=f(Q,R). The vector D(d) includes both the coordinates assigned to the point representing d, (dx,dy) and also

Page 15: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

the iconic characteristics, which we represent by a single label dI. There is also an impor-tant interplay between the aspect of the relation f that is determined by the system (call it S) and the aspect that is controlled by the user (call it U). While our ultimate goal is to have most of the relation understood by the system, so that the user does not have to exert much control, we must always allow the user to override. In modeling the relation be-tween these two parts, S and U we may use multiplication, or addition.

3.5. User Models and Interaction PatternsLearning user model will integrate information we gain from the users by Tang and Ng with off the shelf machine learning packages to lay the foundations for automatic estima-tion of user preferences and automatic estimation of document quality. We will use pack-ages such as RIPPER and C4.5Rules, as well as statistical pattern recognition methods in this task. The effectiveness of the model will be tested by comparing, after the fact, the automatic estimates of preferences and qualities with the user judgments collected during Act 1 and Act 2 of the user studies.

3.5.1. Experimental SetupExperiments for developing measures of information quality and user preferences follow the design indicated in Figure 3. Key to this process is the development of a sub-collec-tion, which is complex enough to exercise the various interface and fusion techniques to be studied, but small and controlled enough to support sensible experimentation. As shown in Figure 4, this collection will be built on the basis of searching, in the Web, by human subjects, working on problems of realistic complexity.

The subjects will be recruited from around the country, permitted to complete their tasks using a web interface, and paid for their efforts at a level somewhat above corporate re-search librarians ($30/hour). They will be set tasks, which require them to critically ana-lyze the materials that they find, and to summarize their findings in reports, and in both

Page 16: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

relevance and quality judgments, on scales to be developed. An example task, if the work were to be done today, might be “assess the factors influencing the Palestinian position on whether or not to declare an independent state on September 15th, and the likely im-plications for American corporations doing business in the Middle East.” The problem could also give rise to a set of questions such as “should corporate personnel be recalled to the States before that date” or “what is the position of the new King of Jordan on this question?” etc.To build the collection we will set the team of searchers to searching the web for relevant information, and collecting “good” URLs into summary pages (one for each searcher). Searchers will be instructed, in this phase of their research (labeled Act I in the Figure 4) to scan no more than the first page of results returned by any search engine that they use. In other words, they are to conduct a more “depth-oriented” search, in which following links is given priority over scanning new pages at the same search site. However, our ro-bot will be working, in the background, to retrieve the next 9 pages turned up by the search engine. These pages will be added to the pages actually viewed by the searchers, to form the “Test Collection”.

We can estimate the size of this collection at 20 searchers by 40 Search Engine query ses-sions by 100 pages per session, which gives 80,000 pages, plus whatever pages are accu-mulated during the depth-first process. Assuming each subject will work for 10 hours during Act I, this is perhaps another 20x600 = 12,000 pages. Thus, for a specific task, a collection of about 100,000 web documents will be assembled. To this will be added some number of “noise” documents chosen more or less at random. This collection will form the basis for our study of quality, retrieval, fusion, and ultimately question answer-ing.

By interviewing users, and interacting with their search sessions, we will develop and re-fine a set of quality measures, which will be used to control the display, and to rank docu-ments for presentation, summarization, etc. The emphasis throughout will be on identify-ing those kinds of quality measures which can be ultimately automated, so that the sys-tem has a good probability of assigning the same value for a given quality metric as would the human user. For some metrics, such as dateline, this is easy; for other metrics, such as the authoritativeness of the site, require developing an index of reputable or dis-

Page 17: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

reputable (or, biased in a certain direction) sites. Still others, such as the style and gravi-tas of the page itself, lead us into genre identification and style. We will build on our own preliminary work on punctuation-based genre identification, as well as established in-dexes of textual complexity.

3.5.2. Learning User Models and Quality Metrics

Within a specific domain, such as the example given above, we may also be able to learn, using machine learning (ML) techniques, that certain words or phrases are an indicator of affiliation, bias, or point of view. All of these approaches will support automated identifi-cation of the values that various quality metrics should assign to given document. User interviewing will be conducted through a second browser window, and will focus not only on the assignment of quality metric scores to a page, but also on learning which fea-tures of the page led to the assignment of that score. Together these activities will gener-ate a vector of quality scores for each document, which we denote by Q(d).

Our experimental design, in the second part of the research, will look for the best values of the coefficients Sab, which include the rules for data fusion of relevance scores, for weighting of quality characteristics, and for conversion of both of those into position and iconic information in the user display. The experiments will also look into the question of whether addition or multiplication is the best way to have the user and the system negoti-ate their preferences into the display.

Specific experiments in this phase of research will form a feedback loop, where the user's responses, judgments and evaluations of the system feed back into the selection of the fu-sion scheme, the specific weights, and the negotiation rule. These will use efficient Taguchi designs (Taguchi et al., 1988), which control the amount of user effort by ruling out certain higher order interaction effects, and streamlining the experimental design. Since the exploration of the design space must be guided by intuition, these quick experi-ments will be a powerful tool for homing in on good display and fusion schemes. The role of the users at this point, which we call Act II, is to use the prototype interface to ex-plore the documents of the test collection, looking for those, which although they did not appear on the first page, or during the drill-down process, are valuable in answering the queries posed.

We anticipate that we will be able to have prototype quality estimation, relevance assess-ment, fusion and display algorithms in place at the end of 5 months, and will then be able to begin a series of 3 formative studies, each lasting about 2 months and requiring ap-proximately 20 searchers who will serve as the experimental subjects. In each trial all searchers will work on aspects of the same problem, and will contribute to pooled evalua-tion of the retrieved documents.

3.6. Information Visualization and NavigationThe question answering process endemic to HITIQA in general will identify relevant documents from which the answer to the users question can be pulled. The information exchange process will result in many such matching documents. These documents will

Page 18: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

have varying levels of quality, they may also contradict each other. Full utilization of the reported documents and the contained information requires that the user be provided with tools to visually compare, analyze, and select from among the documents. The documents must be visually represented and associated in conjunction with interaction capabilities to control the parameter values impacting the document selections. The goal being to pro-vide the user with a greater sense of understanding of the information available with tools to assists in navigating through the documents and information space to identify the most important and relevant information.

3.6.1 Multi-Parametric Data VisualizationWith the large number of documents potentially matching the users’ presented question and specified criteria it is important to show the user the various factors of each docu-ment. For example, if we can present the different quality factors that each document contains then the user can more accurately select the documents that meet their needs. Other characteristics that should be incorporated into the presentation include:

1. Keywords incorporated into the users questions that appear or do not appear in the document.

2. Question responses validating or invalidating the document3. Quality factors and a representation of their values.

Through multi-parametric techniques this information can be incorporated into a single display, using different visual attributes for each parameter or element. Multi-parametric techniques can be thought of as a form of visual data fusion. Perceptual studies have shown that up to seven visual characteristics are differentiable simultaneously by a user.

While the visual data fusion is inherently a separate process from the data fusion capabili-ties discussed previously, they must be integrated. In this fashion, the parameters associ-ated with the data fusion computation must be represented visually. The effect of chang-ing the data fusion parameters must be shown such that the user can identify the impact the individual parameters have and select the best parameter values for a given task.

Through user interface components, the user must be able to select from the available pa-rameters. This will allow the user to select the parameters most important for the given task and to explore the various parameters to compose techniques for future tasks

The visual representation will ultimately take the form of glyphs being presented on the screen. Each glyph will represent one or more matching documents. Attributes of the glyph will be representative of an associated parameter. The attribute can merely be rep-resentative of the presence of a parameter, i.e., a keyword is present in the document. In this simple case we have an on/off mapping. In the more complex case the attribute will be representative of the value of the parameter. In this case, we will map the actual value of the parameter to the grayscale intensity of the attribute, i.e., black represents the lowest value and white represents the highest value. Alternatively, we can use a heat colormap in which blue represents a low value and red represents a high value.

Page 19: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

3.6.2 ClusteringThe sheer number of documents available to the system guarantees that there will be a large number of documents matching a given set of criteria. Through visual clustering we will group documents that are closely related along specified criteria. This is important since all documents will match other documents on one criteria or another. By grouping them together on criteria the user considers important for the current task the user can re-duce the number of individual items that must be considered. In effect, reducing the se-lection to a selection of the appropriate characteristics.

The visual presentation will include a corresponding representation of the number of items in the cluster. The details of the cluster, the number of items, its characteristics, and the parameter values incorporated into it can be derived by probing the individual cluster representations. The user will then be able to select a cluster and zoom into it to find out details of the individual messages incorporated into it. Another level of selection will al-low the user to view individual documents. For large datasets it will be appropriate to have multiple levels of clustering. For example, if clustering at one level does not suffi-ciently reduce the number of visual items under consideration then the user will be able to apply additional levels of clustering until the information available is manageable.

3.6.3 Interaction and SteeringThe effectiveness of a visualization environment is only as good as its interface. Merely providing the user with a static automatically derived visual display has only limited us-ability. Different types of tasks require the use of different parameter settings to get the best, most desirable results. Settings for one task may not be applicable to another task.

Our goal is to create an effective exploration and analysis environment rather than a static visual display. Any data set being analyzed will have relationships and interdependencies not visible on initial investigation. Through exploration and analysis these characteristics can be made clear. We must identify the dependencies and interrelationships between dif-ferent messages such that the all important messages for a given task are identified.

Steering allows the user to dynamically change parameter values and observe the effect of those parameter changes in real-time during execution. This type of modification pro-vides the user with great control of the environment and its results. It gives the user un-paralleled comprehension of the environment, enabling the user to direct the execution of the environment to desired results without wasting time on fruitless paths of exploration.

3.6.4 Selective Parameter RefinementOne of our goals with the environment over the long term is to remove the black box ef-fect characteristic of many analysis environments. Through the visual presentation of the generated results and extensive interaction capabilities we want the user to get a feel for the effect of the different parameters and how they impact the results. The user should be able to generate a set of effective parameterizations for different types of problems and associated questions such that future analysis will progress more smoothly and quickly.

Page 20: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

The HITIQA environment will attempt to generate a series of questions designed to re-fine the users’ initially posed question. Minor changes in the users answers to these ques-tions can greatly impact the resulting message matches. Incorporating interface capabili-ties to allow the user to dynamically modify question answers will allow the user to see the effect of the changes. The visualization will show the changes from the completion of one question to another as well as between different answers to a posed question.

This will generate a series of "what-if" scenarios and allow the user to view the different conclusions reachable by the different scenarios. To simplify the interface, we expect that a limited number of answers for a posed question can be generated. The question genera-tion process is indicative of the fact that the environment has determined there is more than a single solution to the users original question. The set of expected solutions can then be derived from the same set of messages from which the question was pulled. This allows the user to select from a set of possible answers using a simple pull down menu.

3.6.5 Multiple ViewsFor a given data set, there are many ways that the data can be viewed. By changing pa-rameter values the user can view the data in different ways. In one aspect, it is important to provide the ability for the user to change the parameters and explore the data set. In an-other aspect, it is important to be able to closely compare the varying results derived from different parameter settings. Multiple views allows the user to bring up multiple represen-tations of a single data set simultaneously so they can be compared for benefits and weaknesses and further the exploration process in a more knowledgeable fashion.

3.6.6 Human Perception and the UserIt is important to keep the user in mind when developing visualization and interaction techniques. At a fundamental level, we must consider how the human perceptual system will interpret the visual displays we are providing, taking care to avoid representations that will be misinterpreted. We must examine the possibility of developing visual tech-niques that will map well with the pre-attentive capabilities of the human visual system. This will allow characteristics being displayed to be interpreted much more readily than can be accomplished with visual techniques that are dependent on cognitive analysis.

At a higher level, we must determine if the presentation will be comprehensible to users. Will they understand the information as presented? Will the interaction metaphors make sense and be easily learnable so they can take full advantage of the environment?

We must be wary of overloading the user with too many interface controls. The number of parameters and attributes discussed may result in the interface being overwhelming. This will require a feedback loop with potential users to refine the environment.

3.6.7 The Visualization ExperienceBy removing the black box perspective, the users gain a better understanding of how pa-rameters affect resulting document selections. This provides users with the experience to use the environment more effectively and provides an understanding of what the environ-ment’s algorithms do. This experience allows the users to use the environment more ef-

Page 21: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

fectively. It also engages the users in the analysis process, allowing them to more fully explore the data set. The engaging of the users will make the entire process of analysis more enjoyable. This will help maintain the users’ focus and increase their effectiveness.

3.7. Related Research<refer to and contrast with other significant efforts in Q&A, Quality, Dialogue, Infoviz>

ReferencesTaguchi, Genichi, Elsayed A. Elsayed, Thomas C. Hsiang (1988). Quality Engineering in

Production Systems (McGraw-Hill Series in Industrial Engineering and Manage-ment Science). New York: McGraw-Hill.

Varshney, P.K. (1997). Scanning the issue. Special issue on data fusion. Proceedings of IEEE. Vol. 85, No. 1.

Vogt C.C., Cottrell G., Belew R. and Bartell B. (1997) Using relevancy to train a linear mixture of experts. In D. Harman (ed.) Proceedings of the Fifth Text Retrieval Conference. Washington. DC: GPO.

Ng, K.B. and Kantor, Paul (2000) Predicting the effectiveness of naive data fusion on the basis of system characteristics. Journal of American Society for Information Sci-ence. 2000 November.

Ng, K.B., Kantor, P.B. (1998). An Investigation of the conditions for effective data fu-sion in information retrieval: A pilot study. Proceedings of the 16th Annual Meet-ing of the American Society for Information Science (Oct 1998).

4. Statement of WorkThe prompt deliverable will be an interactive interface for information analysts, and sug-gested parameter settings for data fusion, and for integration of quality factors, in a two dimensional display of potentially relevant documents. The medium term deliverable will be an automatic system that works with an intelligence analyst, and learns, interactively, the best parameter settings for fusion and quality integration, based on the analyst's ac-tions, and the type of problem. The long term deliverable will add to this a method for co-herent summarization of the most valuable documents retrieved from multiple sources.

Task 1. Information Quality Metrics

1.1. Quality ParametersEstablish the initial set of quality criteria important in intelligence analyst work. Develop a corresponding set of measurable quality indicators that can be com-puted automatically.

Page 22: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

1.2. Web-based toolsDevelop tools for interviewing the user on the web. This includes designing and implementing Graphical User Interface and integrating Internet search tools.

1.2. User experimentsConduct experiments with users performing specific information tasks. The ex-periments will be performed in phases with increasingly more sophisticated tools.

1.3. Quality ModelsUse Machine Learning methods to derive models for automated assessment of quality criteria from document features.

Task 2. Interactive Question Answering

2.1. Question Analysis and ClassificationDevelop methods and tools for question analysis. These will include phrase and concept extraction, collocations, n-grams, name extraction, and syntactic struc-tures. Develop or adapt question classification based on the results of question analysis and user performance data. Integrate into a text indexing and search sys-tem.

2.2. Data-Driven Semantics for QuestionsDevelop algorithms for subcategorizing the Retrieved Set into sub-themes, sub-topics, aspects and “dimensions”. Experiment with granularity (passages, sec-tions, events); seeded and un-seeded clustering, dimensionality reduction.

2.3. Integrate Clarification DialogueIntegrate Clarification Dialogue Manager into Q&A main system.

2.4. Answer GenerationDevelop preliminary guidelines for producing final answer to the user. These in-clude criteria for composition, length, order and layout.

Task 3. Clarification Dialogue

3.1. Quantitative text processingExtend semantic processing of questions to support dialogue management. Adapt techniques for topic detection, cross-document summarization, theme clustering.

3.2. Dialogue ManagementDevelop Dialogue Manager sub-system to conduct mixed-initiative dialogue with the user. The dialogue manager will be driven by the question semantics, user profile and context information, and answer quality constraints.

Page 23: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

3.3. Multimedia DialogueExtend Dialogue Manager to handle both language interaction and point & click interaction with visual displays.

Task 4. Information Visualization

Task 5. Data Fusion

5.1. Fusion parametersDevelop some alternative parameter sets for data fusion (based on precision, and on scheme difference; allowing for user tuning of weights).

5.2. Automatic Adjustment of Fusion parametersBased on the experiments with real subjects, and feedback through the interface, develop methods to adjust fusion parameters to adapt to a specific user, as searches progress. This applies to combining the results of multiple retrieval schemes.

Task 6. Evaluation and User Studies

6.1. Formative Evaluation. The entire project depends on rapid (2-month) cycles of prototyping and testing to explore the space of design options and move in the most promising directions. In years 2 and 3 we will move also to summative evaluation, to establish the ef-fectiven3ss of the system developed, as a tool to assist the work of intelligence analysts.

Task 7. Project ManagementPeriodic reporting to the Sponsor, project reviews, project meetings.

5. Schedule and MilestonesProject activities will proceed along two parallel and mutually co-dependent tracks: user experiments and modeling, and system development. We propose a number (say 3) of short development cycles with experimental analysis and conclusions, each lasting just 2 months, with “semi real” users recruited from corporate information analysts, who would work in their spare time as our subjects. Each subject should spend a minimum of 20 hours with the system, since we are more interested in continued use than in novice or “start-up” issues. The task should be preparation of a report, or at least an annotated web page linking to relevant data items. We will either judge these ourselves or (even better) get the sponsor to do the judging. These activities (5 months of development, 6 months of testing and redesign, and one month to wrap it all up) will let us show all the components of our plan, from quality to semantics, from visualization to fusion to Taguchi design.

Page 24: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

The first year of research will aim only to develop and refine this complete experimental scheme for finding best parameters, for fusion, for display, and for negotiation. As a byproduct it will produce initial estimates of the fusion parameters and other variables. On the system development front, we will prototype an initial version of semantics-driven Q&A and a preliminary version of clarification dialogue manager. This proof-of-concept prototype will be demonstrated on a limited range of questions and a fixed data collection, e.g., a TREC subset. The first year milestones are listed in Table 3.

TABLE 3. FIRST-YEAR MILESTONES

3 Months Multiple retrieval engines up and working Retrieved set sub-categorization designed Dialogue Manager response template designed Online interview forms for aspects of quality Perl scripts for external estimates of page quality

6 Months Perl scripts to fuse data Web interface to gather information on users Web site to collect search results Robot to get additional “back pages” Prototype user interface display Subject recruitment/training aids and tools ready Retrieved set sub-categorization operational Initial experiments with generating clarifying questions

8 Months Flexible fusion algorithms Point/click/drag interface to adjust weights in fusion Automated identification of conflicting points of view Automated identification of some internal quality indicators Subjects recruited and trained Preparation for real time Taguchi experiments on collected

data, using subjects Evaluation starts with clarifying dialogue in initial stage

10 Months Completion of one formative evaluation cycle with subjects Completion of real time Taguchi studies Begin revisions to system

11 Months Offline Taguchi-type experiments Tuning to give optimum performance based on subject data

12 Months Reports/final documentation

Page 25: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

The second year of research will continue this experimentation, and advance on two addi-tional fronts. One is the automatic learning of user preferences. That is, rather than hav-ing to conduct a year's worth of experiments, we want to have a system which observes how it is used by a particular analyst, and automatically arrives at a reasonably good set of fusion and display parameters for that analyst. The parameters may also depend on contextual variables such as the urgency or the complexity of the problem.

The other is automatic synthesis of question answers, based on what is learned about how to identify the most relevant documents. This is a difficult and open-ended task, and will continue to the end of the project. We can sketch some of the subtasks (such as learning to automatically distinguish opposing points of view), but the synthesis of answers to questions is complex. In the scope of this project we expect to accomplish a TREC-like performance, where passages 50 words long are delivered as “answers” to a query. This work will depend on question classification, and on natural language processing.

In the second year, the proof-of-concept prototype will be expanded into a full-scale HI-TIQA prototype and evaluated on a new data collection, preferably an open-ended re-source. Work on question semantics will be expanded into question classification and us-ing analogy to answer new questions. Work on dialogue will expand to include both lan-guage and visual dialogue.

The third year of research will continue refinement of the HITIQA prototype further ex-tending its capabilities.

6. Personnel QualificationsThe success of the HITIQA project hinges on having the right people doing the work. At SUNY Albany and Rutgers we have outstanding researchers in natural language process-ing, information retrieval, data fusion, information quality, and information visualization – scientists whose experience and expertise will be dedicated to the success of HITIQA. The key researchers are listed in Table 2. None of the key personnel have other commit-ments that could prevent them from meeting their HITIQA commitments. Furthermore, all of the key contributors are personally invested in the proposed technologies, and will spend the remainder of their time working on closely related projects. All of the key per-sonnel have extensive experience in the technical areas critical to HITIQA, as illustrated by their resumes below.

TABLE 2. KEY PERSONNEL

Name + Function Organi-zation

Clear-ance

Expertise

Prof Tomek Strzalkowski, PM, PI

SUNY Albany

none applied NLP, information retrieval, query expansion, summarization

Prof. Rong Tang SUNY Albany

none Information systems, quality assess-ment

Prof. Paul Kantor, co-PI Rutgers Univ.

DISCO Information retrieval, information fu-sion, interactive IR

Page 26: HITIQA: High-Quality Interactive Question-Answeringkantor.comminfo.rutgers.edu/SECRET/HITIQA822.doc · Web viewThe entire project depends on rapid (2-month) cycles of prototyping

Prof. K.B. Ng Queens College

none Information fusion, user studies

Prof. Rob Erbacher SUNY Albany

none Information visualization, computer graphics, vis. tools and techniques

Prof. George Berg SUNY Albany

none Machine learning, neural networks, computational linguistics

Resumes

PROFESSOR TOMEK STRZALKOWSKI – University at Albany, SUNY

Education: o Simon Fraser University, PhD Computer Science, 1986. o Warsaw University, MSc, Computer Science, 1981.

Experience (20 years): Dr. Strzalkowski is an Associate Professor of Computer Science at SUNY Albany. Prior to joining SUNY, he was a Natural Language Group Leader and a Principal Computer Scientist at GE CRD. Prior to GE, he was an Assistant Professor of Computer Science at New York University. He received his PhD in Computer Science from Simon Fraser Uni-versity in 1986 for work on the formal semantics of discourse. He has done research in a wide variety of areas in computational linguistics, including database query systems, for-mal semantics, and reversible grammars. He has directed research projects in natural lan-guage processing and information retrieval sponsored by DARPA and NSF, including work under several TIPSTER contracts. While at GE, he was developing advanced text summarization systems for the Government. Dr. Strzalkowski has published over a hun-dred scientific papers on computational linguistics and information retrieval. He is the ed-itor of two books: Reversible Grammar in Natural Language Processing, and Natural Language Information Retrieval.

Dr. Strzalkowski’s professional activities involve active participation in areas of applied natural language processing, automated summarization, and information retrieval. He sits on the Program Committee of the Text Retrieval Conference (TREC), and is co-chair of the Question-Answering track, and a former chair of the Natural Language track. He was the chair of the organizing committee of the 1991 workshop on Reversible Grammars in Natural Language Processing. He has been actively involved in Government-sponsored technology evaluations including TREC, SUMMAC, TDT, and PARSEVAL. He serves on program committees and on review panels for various conferences including ACL, Coling, ANLP, SIGIR, PACLING, SNLP and others. He also serves on the industry advi-sory board for the NY State-Columbia University Center for Advanced Technology.

7. Facilities (ALL)<facilities available: probably cut out from previous proposals>


Recommended