Honto? search: Estimating trustworthiness of web information by search results aggregation and...

Honto? Search: Estimating Trustworthiness ofWeb Information by Search Results Aggregation

and Temporal Analysis

Yusuke Yamamoto, Taro Tezuka, Adam Jatowt, and Katsumi Tanaka

Department of Social Informatics, Graduate School of Informatics, Kyoto UniversityYoshida-Honmachi, Sakyo, Kyoto 606-8501, Japan

{yamamoto,tezuka,adam,tanaka}@dl.kuis.kyoto-u.ac.jp

Abstract. If the user wants to know trustworthiness of a proposition,such as whether“ the Japanese Prime Minister is Junichiro Koizumi”istrue or false, conventional search engines are not appropriate. We there-fore propose a system that helps the user to determine trustworthinessof a statement that he or she is unconfident about. In our research, weestimate trustworthiness of a proposition by aggregating knowledge fromtheWeb and analyzing creation time of web pages. We propose a methodto estimate popularity from temporal viewpoint by analyzing how manypages discussed the proposition in a certain period of time and how con-tinuously it appeared on the Web.

1 Introduction

People often become unsure about a statement they encounter on the Web, forexample a statement such as“the Japanese Prime Minister is Junichiro Koizumi”or“ dinosaurs became extinct 65 million years ago”. In such a case, the useroften types in the statement into a search engine, and examines how common itis on the Web, or tries to check if there are any contradicting answers. This is,however, a time consuming task.

We therefore propose a system that helps the user in determining trustwor-thiness of a topic that he or she is unconfident about. We named our system“Honto? Search”.“Honto?”means“ Is it really?”in Japanese. We focus onassisting the user to make a judgment on trustworthiness, rather than makingan absolute decision.

There are various criteria for information’s trustworthiness: the level of popu-larity, the author’s reliability, or consistency of the content itself. In our research,however, we use popularity as the criterion. In other words, our system providesthe user with popularity estimation of a phrase and its alternative or counter ex-amples occurring on the Web. The system also presents changes in the frequencyof these phrases in time, in order to help the user to decide if the original phraseis up-to-date, or if it has been continuously stated for a long span of time, thusensuring its reliability.

Fig. 1. Honto? Search: System Overview

In order to provide such functionality, the system performs the following pro-cedure. First, it divides the phrase given by the user into parts, and constructs aquery that would be sent to a web search engine. Secondly, it extracts alternativeor counter examples to the original phrase out of the search results. Thirdly, thesystem sends the original phrase and the counter examples to the web search en-gine again, and obtains their present frequencies. Fourthly, it sends the phrasesto a web archive and obtains the temporal change in their frequencies. The resultis presented to the user as a list of phrases and a graph indicating the temporalchange.

The rest of the paper is organized as follows. Section 2 describes relatedwork. Section 3 presents the method of extracting counter examples from theWeb. Section 4 presents the method of temporal analysis using a web archive.Section 5 describes the result of the experiment made to validate the effectivenessof our approach. Lastly, Section 6 concludes the paper.

2 Related Work

2.1 Web QA

Web question answering (Web QA) systems are similar to our system in that theyretrieve text segments from the Web to answer the user’s information requests.However, they are different from our system in terms of goals. Web QA systemsuse the Web to find an answer to a question posed by the user [1, 11]. Most WebQA systems go no further than finding an answer to the query. For example,TREC’s question answering track provides a benchmark for evaluating efficiencyin finding an answer to the user specified question1. It is assumed that the answeris reliable and the user is satisfied once he or she gets it. In reality, however, this isnot often the case with the Web since it contains a lot of unreliable and obsoleteinformation. In our system, the user does not type in an interrogative sentence.Instead, the user types in a phrase whose trustworthiness he or she doubts. The1 TREC Question Answering Track, http://trec.nist.gov/data/qa.html

goal is to extract additional information from the Web to help the user judgethe trustworthiness of the proposition contained in the phrase.

There is a recent trend in Web QA systems to utilize redundancy of in-formation found on the Web [4, 5, 10]. Systems that do this aggregate phrasesand present the most frequent one as the answer. This was made possible bythe tremendous size of the Web compared to the traditional QA source data(corpora). Although these systems are similar to ours in that they utilize redun-dancy of information on the Web, the aim and the approach is different fromour system.

2.2 Term Dynamics

Our system uses temporal changes in frequencies of phrases to filter out obsoleteinformation. Kleinberg made a survey of recent approaches for analyzing termdynamics in text streams [9]. Kleinberg’s “word burst” is a well-known methodfor examining changes in word frequencies over time [8]. It is a state-based ap-proach that measures term dynamics characterized by transitions between twostates: low and high frequency one. Kleinberg’s work, however, was intended tomodel significant bursts of terms in text streams, whereas in our system we com-pare differences between the frequencies of phrases and their duration in time.Kizasi2 is an online system that extracts keywords that have recently becomeincreasingly popular recently. This system focuses only on keywords, in contrastto our method.

2.3 Web Archives

We propose utilizing web archives in order to estimate the popularity of propo-sitions in time on the Web. Web archives preserve the history of the Web andindirectly reflect the past states and knowledge of the societies. Until now, how-ever, Web archiving community has mostly concentrated on harvesting, storingand preserving Web pages as they pose numerous challenges. Relatively, littleattention has been paid to utilizing Web archive data despite the fact that itoffers a great potential for knowledge discovery purposes. Aschenbrenner andRauber discussed possible benefits and approaches towards adopting traditionalWeb mining tasks in the context of Web archives [3]. Recently, Arms et al. havereported on ongoing work aiming to build a research platform for studying theevolution of the content and the structure of the Web [2]. We believe that suc-cessful completion of similar projects in the future will enable more effectiveknowledge discovery from the history of the Web.

2.4 What Honto? Search is Not

The following list indicates some of the functions that are not realized by Honto?Search.2 Kizasi, http://kizasi.jp/ (in Japanese)

Keyword search: In Honto? Search, the user query is a phrase. It is differentfrom conventional web search engines where the user inputs keywords.

Page search: The search results of Honto? Search are relative frequencies ofthe query phrase in comparison to alternative phrases. The system presents ag-gregated knowledge instead of the lists of web pages as in the case of conventionalsearch engines.

Opinion miner: The main target of Honto? Search is factual information.It is not intended to collect people’s opinions on certain topics where no definiteanswer exists.

3 Collecting Alternative Propositions

In this section we describe how to identify alternative propositions, or counterexamples, for a phrase query given by the user. More generally, the system findsterms that fit into a specific part in the user’s query. For example, for a phrase“dinosaurs became extinct 65 million years ago”, the user may want to check if“65 million” is actually true. In this case, the system searches the Web to findother terms that constitute the expression, such as “60 million”, “70 million”,or even “10 thousand” (which is wrong). We call such terms alternative terms,and a phrase containing it will be called an alternative proposition.

In Honto? Search, the user selects a part from a phrase that he or she isunsure about. The part will be called verification target in this paper. Termsthat would replace the verification target in the phrase are alternative terms.If the user does not specify a verification target, then the system performs theprocedure to each word in the phrase.

The system performs the following procedure to identify alternative propo-sitions. Fig. 2 explains this procedure.

1. The query is constructed by splitting the phrase into two parts by verificationtarget T. For example, if the user inputs “dinosaurs became extinct 65 millionyears ago” into our system, since the target T is “65 million”, we get twoqueries, P1: “dinosaurs became extinct” and P2: “years ago”.

2. The system sends a query “P1 & P2” to a web search API. Alternativeterms are extracted from the search results by using a regular expression,/P1 (.*) P2/. The text segment that comes in the middle is extracted as analternative term, as long as it is contained in one sentence and is differentfrom the original verification target.

3. Alternative terms are sorted according to their frequencies. The more theyappear on different web pages, the higher they are ranked. Terms that appearbelow the threshold value are eliminated. A set of alternative terms maycontain much noise at first, but this sorting and filtering process reduces it.

An alternative proposition is constructed by inserting an alternative termbetween two separated parts of the original phrase, P1 and P2. Each alternativeproposition is again sent to the search engine, to obtain its frequency.

One of the problems with Honto? Search now is that it is still incapable tohandle statements with negations. Fortunately, frequencies of such statementsare relatively low compared to the positive ones, so the aggregated answersusually show good results.

In case the web search API returned too few results, searching is performedagain by relaxing the query. The system extracts nouns, verbs, or adjectives fromeach alternative proposition and performs multiple keyword search. It constructsthe query by connecting keywords with “&”. In the example, the query will be“dinosaurs & became & extinct & 65 & million & years”. The system thensearches within the retrieved web pages to find a sentence that contains all thekeyword.

This step allows the system to retrieve phrases with the same meaning butexpressed in different forms. Although the method is vulnerable if many sen-tences contain negations, expectations, or interrogatives, we assumed that oncethe result is presented as a list, the user can check it by accessing the snippetswhich are linked from the alternative phrases.

The frequencies are then presented to the user. By comparing the frequencyof the original phrase with the frequencies of alternative propositions, the usercan get an idea of how much the phrase is supported on the Web.

4 Analysis of Generation Time of Propositions

Because the sentences collected by the approach proposed in Section 3 are gener-ated at different times, it is not appropriate to use them for a majority decisionwithout careful consideration. For example, consider the proposition, “the hostcity of the next Summer Olympic Games is Beijing”. This proposition is correctonly until the event is held in 2008. This example shows that trustworthiness ofthe proposition is strongly dependent on time.

In addition, if the system considers temporal information, it can estimatetrustworthiness of the proposition from different aspect also. That is, the systemcan evaluate how continuously the proposition has been accepted over time.

We define two criteria for determination of trustworthiness. Fig. 3 explainsthis procedure.

4.1 Analysis of Creation Time of Web Pages

To analyze the temporal distribution of web pages relevant to a proposition,we need to determine when each page was created. The system uses InternetArchive3. Internet Archive is the well-known public web archive offering about2 petabytes of data. The access to Internet Archive is provided by WaybackMachine that allows viewing snapshots of pages from the past. Using WaybackMachine it is possible to reconstruct the histories of web pages. After issuingURL address of a given page Wayback Machine returns the list of available page3 Internet Archive, http://www.archive.org/web/web.php

Fig. 2. Knowledge Aggregation Procedure

Fig. 3. Temporal Analysis Procedure

snapshots together with their timestamps. We can estimate the creation time ofpages by considering the oldest timestamps that appear in the list. Consequently,it is possible to analyze the temporal distribution of creation time of pages thatrefer to a given proposition.

Utilizing Internet Archive for temporal analysis has, however, several limi-tations. First, the temporal scope of page snapshots is constrained. Crawling ofWeb pages has started since 1996. There is also no data provided that is youngerthan 6 months due to the policy of Internet Archive. Second, after a closer inspec-tion we discovered that there are some gaps in the data due to uneven crawlingpattern in the past. Third, page creation dates estimated by our method areonly approximations of the actual origin dates of pages. There is always somedelay between the creation of a page and its detection by Web crawlers. Lastly,it is also unsure whether propositions occurring on pages have actually appearedat the time of page creation and not later due to subsequent updates made topage content. Nevertheless, since our approach analyzes relatively large numberof pages and utilizes relative frequencies of phrases, satisfactory results can stillbe obtained despite the above limitations.

4.2 Trustworthiness of the Proposition in a Certain Period

In this section, we propose a way of estimating trustworthiness of a propositionin a specific period by comparing the temporal distribution of a proposition withthat of alternative propositions.

In the first step, we define PFA of proposition A at time period t (PF isProposition Frequency) .

Proposition Frequency of proposition A at time period tPFA(t) : the number of the web pages which refer to proposition A and weregenerated at time period t

Using PF , we can estimate which proposition is the most reliable in a cer-tain period of alternative propositions. That is, if we want to estimate whichproposition is more reliable, proposition A or proposition B in time period t, weonly have to compare PFA(t) with PFB(t). If PFA(t) is greater than PFB(t),we can estimate that proposition A is truer in period t than proposition B. Wecalculate PF (t) of the proposition, which is the proposition the user inputs intoour system and PF (t) of alternative propositions which our system made fromthe original proposition, and we identify the proposition for which PF (t) hasthe greatest score as the most reliable proposition in period t.

Usually, the number of newly generated web pages has many up-downs ina short span, forming a zigzag line over time. Considering this phenomenon,we modify PF by using the information over a certain period. We adopted themoving average in order to solve this problem [7] .

Modification to PFA(t) with Moving Average

PFA(t) =1

2n + 1

n∑i=0

PF ′A(t ± i) (1)

We modify PF ′A(t) (the original value of a proposition frequency) around

time period t by a window size 2n + 1. By comparing each PF modified bymoving average, the system finally estimates trustworthiness of a proposition ina certain period.

4.3 Proposition Continuity

Honto? Search employs temporal analysis not only in order to select the mostrecently popular proposition (as described in Subsection 4.2), but also to informthe user whether the proposition has appeared on the Web for a long enoughperiod of time. The underlining assumption is that such information is helpfulin determining trustworthiness of the proposition. For example, a proposition“aluminum is the cause of Alzheimer’s disease” was once a popular theory andhas been widely discussed on theWeb, yet not as much now. On the other hand, aproposition “Alzheimer’s disease causes dementia” still commonly appears on the

Web. Presenting the temporal change in the frequencies of the two propositionswould help the user to make judgment on reliability of the two.

The difference between the two is formalized as follows: in the case of a propo-sition that sustains to be reliable over time, web pages referring to the proposi-tion are constantly generated. On the other hand, in the case of a propositionwhich was reliable only during a certain limited period of time, the amount ofnew web pages containing the proposition decreases once the proposition ceasesto be reliable. In order to draw a line between the two, we look to a theory inpsychology.

According to Hermann Ebbinghaus, a person’s memory decreases exponen-tially [6]. The amount of remaining memory R at time t is defined as follows,using a coefficient γ:

d

dtR(t) = −γR(t)

Based on this theory, we built the following model. When the amount of webpages containing the proposition is over λ of the amount on the prior month,we judge that the proposition is still attracting people’s attention. On the otherhand, if the amount is less than λ of the prior month, we judge that the propo-sition has entered the receding phase; it is losing people’s attention and willeventually be forgotten by the public. λ is a threshold value and may be ad-justed experimentally.

We define proposition continuity as a measurement indicating how long aproposition has been attracting people’s attention. We assume that the numberof new web pages containing the proposition reflects people ’s attention to it.

PCA(t) is proposition continuity of a proposition A at time t. It is defined asfollows:

Proposition continuity of a proposition A at time t

PCA(t) ={

PCA(t − 1) + PFA(t) if PFA(t) ≥ λPFA(t − 1)

αPCA(t − 1) + PFA(t) if PFA(t) < λPFA(t − 1)(2)

λ is a threshold value for detecting the receding phase. α is a coefficient thatwould decrease PC exponentially when PF is dramatically decreasing. If theamount of new web pages containing the proposition A is more than λ of theprior month, PCA(t) increases by PFA(t). If the ratio is below λ, PCA(t) dropsrapidly, since we assume that the proposition has entered the receding phase.

Honto? Search presents PC to the user as the indicator of how consistentlythe proposition was supported by the public.

5 Experiment

In this section, we describe the result of experiments that tested the effectivenessof our approach on estimating trustworthiness of propositions.

5.1 Discovery of Alternative Propositions and AggregatingSentences

To get alternative propositions, we used Yahoo! Web Search APIs4, a web ser-vice for searching Yahoo!’s index and got 1,000 results for each proposition. Wecollected only web pages in Japanese. From snippets of the search results, weextracted alternative terms using the method described in Section 3. Then wecounted the frequency of each alternative term and eliminated the ones whichhad a frequency lower than 15 % of the most frequent one. This is because weassumed that terms whose frequencies are currently low are not appropriate asalternative terms.

After creating alternative propositions containing alternative terms, we useda Japanese morphological analyzer, Mecab5, to extract nouns, verbs and adjec-tives from the snippets (brief summaries of search results). Finally we collected1,000 web pages for each alternative query and aggregated them using the pro-cedure described in Section 3.

We performed experiments on two propositions, “there are 15 countries inthe European Union” (Example 1) and “the President of China is Hu Jintao”(Example 2). Verification targets were “15” and “Hu Jintao”, respectively.

“There are 15 countriesin the European Union.”

alternative terms frequency

25 18715 15610 141

“The President of Chinais Hu Jintao.”

alternative terms frequency

Hu Jintao 589Jiang Zemin 574

Table 1. Estimation of Trustworthiness of Propositions

Table 1 lists the frequencies of the original and alternative propositions inthe web search results. For example, for the proposition“ there are 15 countriesin the European Union”, we got two alternative terms,“25” and “10”. The mostfrequent one was “25”, which is the correct answer. The alternative “15” alsoproduced many results, since it was true until 2004. Additionally, the alternativeterm “10”, was also frequently reported on the Web, which must have come fromexpressions such as “10 new countries in the European Union”. The user canjudge that the original proposition may not be trustworthy, since it is not themost frequent one.

For the proposition, “the President of China is Hu Jintao”, we got an alter-natives proposition “the President of China is Jiang Zemin”, which was actuallytrue until 2003. From the table, the user can judge that “the President of Chinais Hu Jintao” is reliable, since it is the most frequent one. These are simpleestimations which do not consider the temporal aspect.4 Yahoo! Web Search APIs, http://developer.yahoo.com/search/web/V1/contextSearch.html5 Mecab, http://mecab.sourceforge.jp/ (in Japanese)

Fig. 4. The precision of our system for a test collection

Furthermore, we performed a thorough experiment using a list of historicalevents (historical time table)6 as a test set, to see whether the system justifiesthese events as “true”, when compared with other mistaken information on theWeb.

In the time table, there were 360 major historical events dating from 3000B.C. to 2003. We constructed verification targets by connecting the event nameand the year when it occured, i.e. “the moon landing by Apollo 11 on 1969”. Asearch query is constructed by replacing the year with a wild-card, i.e. “the moonlanding by Apollo 11 on * ”. Out of 360 queries, 116 has returned search results.The system then collected other candidates specifying different years (i.e. “themoon landing by Apollo 11 on 1970”) and ranked them by their frequencies onthe Web. We checked if the correct answers were ranked high when comparedwith other candidate phrases.

Fig. 4 illustrates the result of the experiment. 62% of the correct phraseswere top ranked by the system, 9 % were ranked 2nd, and 3 % were ranked 3rd,while the rest was ranked 4th or below.

5.2 Analysis of Page Creation Times

Based on the method discussed in Section 3, we calculated PF and PC for theoriginal and alternative propositions, in order to see the trustworthiness fromthe temporal point of view. We used the Internet Archive to estimate when eachweb page was created. We considered pages that were created between 1998 andnow. In the Internet Archive, the user can not see web pages that are collectedmore recently than 6 months. Therefore, we could only use data until July 2006and on. For calculating moving average, we used n = 6. For calculating PC, weused 0.8 for λ and 0.5 for α.

Fig. 5 shows that PF of “there are 25 countries in the European Union”overwhelms the other two propositions around 2004. The user can guess thatthere was possibly a change in the number of the countries in the EuropeanUnion.6 http://www.h3.dion.ne.jp/˜urutora/sekainepeji.htm (in Japanese)

Fig. 5. Proposition Freq. for Example 1 Fig. 6. Proposition Freq. for Example 2

Fig. 7. Proposition Cont. for Example 1 Fig. 8. Proposition Cont. for Example 2

Fig. 6 shows that although PF of the proposition “the President of China isJiang Zemin” is higher than that of “the President of China is Hu Jintao” at thebeginning, they reversed around 2003. In fact, Jiang Zemin was the President ofChina until March 2003 and Hu Jintao has been the president ever since.

Fig. 7 shows that for the proposition “there are 15 countries in the EuropeanUnion”, PC decreases at the end, indicating that it is no longer true.

Fig. 8 shows that PC for “the President of China is Hu Jintao” continues toincrease, while PC for “the President of China is Jiang Zemin” decreases at onepoint, indicating the change in the presidency.

6 Conclusion

In case the user wants to know whether a proposition is true or false, there are nosystems available which estimate trustworthiness of this proposition. Thereforewe have proposed a method that aggregates information on the Web and esti-mates trustworthiness of a proposition from the viewpoint of time by aggregatingweb search result and using a web archive.

By analyzing when web pages were generated, we were able to determinewhether a proposition is true or false during a certain period and if it has beentrue or false from the past until now. The problems of our approach are that wedo not distinguish positive sentences from negative sentences and that temporal

analysis depends onWayback Machine and so once it ceases, we can not preciselydetermine when pages were generated. In addition, alternative terms constructedin Section 3 can be temporally affected by Yahoo! Search. That is, if the top 1,000results are recent data, we can not get older alternative terms.

Our method is a kind of knowledge discovery process. We think that aggrega-tion of web knowledge can be applied not only to estimate the trustworthiness ofa proposition but also to other problems. A part of our future work is to reducethe noise and to estimate trustworthiness of whole web pages rather than onlytheir parts.

Acknowledgement

This work was supported in part by MEXT Grant for “Development of Funda-mental Software Technologies for Digital Archives”, Software Technologies forSearch and Integration across Heterogeneous-Media Archives (Project Leader:Katsumi Tanaka), Grant-in-Aid for Young Scientists (B) “Trust Decision As-sistance for Web Utilization based on Information Integration” (Leader: TaroTezuka, Grant#: 18700086) and Grant-in-Aid for Young Scientists (B) “Infor-mation Discovery Using Web Archives” (Grant#: 18700111).

References

1. Andrenucci, A. and Sneiders, E., Automated Question Answering: Review of theMain Approaches, 3rd International Conf. on Information Technology and Appli-cations, pp. 514-519, 2005.

2. Arms, W. Y., Aya, S., Dmitriev, P., Kot, B. J., Mitchell, R. and Walle, L., Buildinga research library for the history of the Web, Joint Conf. on Digital Libraries,Chapel Hill, NC, USA, pp. 95-102, 2006.

3. Aschenbrenner, A. and Rauber, A., Mining web collections. In Web archiving,Springer Verlag, Berlin Heidelberg, Germany, 2006.

4. Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A., Data-Intensive QuestionAnswering, TREC2001, pp. 393-400, 2001.

5. Dumais, S., Banko, M., Brill, E., Lin, J. and Ng, A., Web Question Answering: IsMore Always Better?, 25th Annual International ACM SIGIR Conf. on Researchand Development in Information Retrieval, pp. 291-298, Tampere, Finland, 2002.

6. Ebbinghaus, H., Memory: A Contribution to Experimental Psychology, ThoemmesPress, 1913.

7. Hamilton, J. D., Time Series Analysis, Princeton University Press, 1994.8. Kleinberg, J., Bursty and Hierarchical Structure in Streams, Data Mining and

Knowledge Discovery, Vol. 7 Iss. 4, Kluwer Academic Publishers, 2003.9. Kleinberg, J., Temporal Dynamics of on-line information streams. In Data Stream

Management: Processing High-Speed Data Streams, Springer, 2005.10. Kwok, C., Etzioni, O. and Weld, D., Scaling Question Answering to the Web, 10th

International World Wide Web Conf., pp. 150-161, Hong Kong, 2001.11. Radev, D. R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Zhang, Z., Fan, W., and

Prager, J., Mining the web for answers to natural language questions, Tenth In-ternational Conf. on Information and Knowledge Management, 2001.

Date post:	28-Mar-2023
Category:	Documents
Upload:	kyoto-u
View:	0 times
Download:	0 times

Honto? search: Estimating trustworthiness of web information by search results aggregation and...

Documents