+ All Categories
Home > Documents > Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of...

Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of...

Date post: 21-Dec-2015
Category:
View: 215 times
Download: 2 times
Share this document with a friend
Popular Tags:
143
Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield [email protected]
Transcript
Page 1: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Improving the effectiveness of Web searching:

Methodological issues

Barry Eaglestone

Department of Information StudiesUniversity of [email protected]

Page 2: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Overview

• An inductive study to build evidence-based meta-cognitive models of web searching by the general public.

• Data modelling issues

– A Temporal data modelling solution

• Discussion & Final thoughts

Page 3: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

An inductive study of how the general public search on the web.

Setting the scene – the database approach and state of the art.

Page 4: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Motivation

• Need to develop new models for searching: update outdated usage paradigms.– Improve training methods– Develop automated assistance systems

Page 5: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Previous studies of search logs

• Web search is shallow + promiscuous• Low use of advanced features• Global statistics

– number of queries/search– Pages viewed / user– query reformulation (change in no of terms)– Most users enter few terms– Little to be gained by increasing complexity

Page 6: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

chemoinformatics

Database

The Team

Information SeekingInformation Seeking

chemoinformatics

Database

Page 7: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Soft Hard

Spectrum of Research Perspective

Modelling/engineering/empirical

Qualitative / quantitative data analysis / modeling

Human / organisationalissues

FormallyDefinedproblems

Computer world formalisations

Hardware /Software solutions

CS Computer WorldCS Computer WorldPeople world ISPeople world ISInventionInventionDiscoveryDiscovery

ProblemProblemSolvingSolvingformalismformalism

Page 8: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

How will we use it?

Effectiveness?

Meta-cognitiveKnowledge aboutweb searching?

How do theysearch?

Who are the searchers?What are they searching

for?

Infer effectiveness from•search transformation patterns•subject’s narrative

Context

The GENERAL PUBLICVolunteers (c500 searches):

ICT coursesUniversity evening classesCity Learning Centre coursesCitizens’ forumPersonal contactsLibraryAdvertisingStudents and academics

+ over 1,000,000 search logs anonymous searchers

•Self-selected searches explained through interview and think aloud protocols•2-3 set searches

Observe and record•Over 1,000,000 anonymous search engine transaction logs

•c500 observed and recorded searches; talk to searchersDetermine query similarity

Delimit searchesCode query transformationModel searches as transformation graphsData mine for stereotypical search strategesCorrelate with who, why and effectivenessThus, establish evidence-based models of search strategy, related to user and problem characteristics and likelihood of success

Evidence-based meta-cognitive trainingIntelligent interfaces

Page 9: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Why Meta-cognition?. “Meta-cognition refers to higher order thinking

which involves active control over the cognitive processes engaged in learning. ….”

Livingston (1997)

• Meta-cognitive knowledge– “…knowledge of personal variables to general knowledge about

how human beings learn and process information, as well as individual knowledge of one’s own learning processes…” e.g. “I have a bad memory!”

• Meta-cognitive regulation– “… activities used to ensure that that a cognitive goal has been

met….”, e.g., question yourself about the text and then re-read.Livingston (1997)

Page 10: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Cognitive Styles Analysishttp://www.memletics.com/manual/default.asp?ref=ga&data=999+learning+styles+free+test

Holist Analyst

Verbalizer

Imager

Page 11: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Syntactical/quantitative Semantic/qualitative

Exite search logs

~106 searchesHolistic search logs

Supplemented with qualitative data

Page 12: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Preliminary work

• Analysis of search logs

• Development of descriptive codes

• Aim is to form a basis for the analysis of our experimental data

Page 13: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Strengths / Limitations

• Large sample• Definitely general public.• No enquiry context – what are they looking

for? What are they thinking?• No measure of success.• Are they searching or just browsing?• Where does one enquiry end and another

begin?• Limited to one search engine – what did they

do during a delay?

Page 14: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Excite Database Sample

qid uid time rank query querymore totwords

343 000000000000006a 192141 0 alco fence company ohio No 4

344 000000000000006a 192219 0 alco fence company ohio No 4

345 000000000000006a 192228 10 alco fence company ohio No 4

346 000000000000006a 192243 20 alco fence company ohio No 4

347 000000000000006a 192328 0 lifetime fence company ohio No 4

348 000000000000006a 192359 10 lifetime fence company ohio No 4

349 000000000000006a 192455 0 lifetime wire fence No 3

350 000000000000006a 192634 0 high tensile wire fence No 4

351 000000000000006b 161906 0 sickle cell anemia No 3

352 000000000000006b 162006 10 sickle cell anemia No 3

353 000000000000006b 162130 0 sickle cell anemia No 3

354 000000000000006c 144303 0 Hilton Garden Inn No 3

355 000000000000006c 144331 0 Hilton Garden Inn Jacksonville No 4

356 000000000000006c 144433 0 Hotel Search No 2

357 000000000000006c 144541 0 Jacksonvill Hotel No 2

358 000000000000006c 144728 0 www.hilton.com No 1

~ 106 queries

1

2

3

Sessions

Page 15: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Query Transformations• Changes in search strategy

– conceptual e.g. changes in type of search: broad specific text image

– Linguistic: syntactic, query structure.

• Examples Q1: shakespeare hamletQ2: shakespeare hamlet quotes

Q3: to be or not to beQ4 “to be or not to be”Q5: “to be or not to be” +shakespeare

Page 16: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Our Preliminary Analysis

• To look at textual (syntactic) changes.

• Link queries by text similarity.

• Infer enquiry change from textual dissimilarity.

• Use these elements to develop a machine-readable codification of QT’s.

• To mine for characteristic patterns.

Page 17: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Code Transformation

N New query

R A repeated query /same page rank – relevance feedback.

P Page ranking (seek more)

p Page ranking (earlier pages)

I(k) Identical

C(k) Conjoint

S(k) Sub-phrase in common

s(k) Sub-phrase + words in common

M(k) Other textual similarity

Example Transformations

Page 18: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT graphs

N 1 2 3 5 4 6 Start

M C C s

22

23

25

26

27

s

s

s

S s

24

28

RP(14)

END

s

s

20

29

R

s

21 5

uid 74: NM(1)C(2)C(3)S(4)s(5)PPRPRRRRPPRRppI(5)s(6)s(22)s(22)s(23)s(25)s(26)s(22)R

nursing careerspaid undergraduate nursing schools in baltimore city maryland

Code Transformation

N New query

R A repeated query /same page rank – relevance feedback.

P Page ranking (seek more)

p Page ranking (earlier pages)

I(k) Identical

C(k) Conjoint

S(k) Sub-phrase in common

s(k) Sub-phrase + words in common

M(k) Other textual similarity

Page 19: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT graphs

7 2

N 1 2

3

5

4

6 Start M C

QJ

C 19 15

14

18

P(7)

END

20

P C

P(3)

Delay

QJ

QD

uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)

molsworth

"us army"

Page 20: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Preliminary Conclusions

• We have developed a rich set of codes describing syntactic part of QT’s

• These can be used to develop a graph-based description

• Correlations between the codes are meaningful/interesting

• They form part of the analysis for our current experimental study.

Page 21: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

…and if you want to read about our preliminary results….

• Whittle M, Eaglestone B, Ford N, Madden A (2007), Data Mining of Search Logs, Journal of the American Society for Information Science and Technology (in press)

• Whittle M, Eaglestone B, Ford N, Gillet V.J., Madden A (2006), Query Tranformations And Their Role In Web Searching By The General Public, Information Research, 12(1) October 2006

• Whittle M, Eaglestone B, Ford N, Gillet V, Madden A (2006), Query transformations and their role in web searching by the general public. Information Seeking in Context Conference 2006 ISIC, Austrailia

• Andrew Madden, Barry Eaglestone, Nigel Ford, MartinWhittle (2006) Search engines: a first step to finding information: preliminary findings from a study of observed searches, Information Seeking in Context Conference 2006 ISIC, Austrailia.

Page 22: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Sheffield Experimental Study

ScreensAudio

Qualitativeanalysis

Quantitativeanalysis

KeystrokesQueriesWeb page titles

Transcribing Pre-Processing

Temporaldatabase

Modeldevelopment

Page 23: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Data modelling issues

Page 24: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Evolution of databasesSetting the scene – the database approach and state of the art.

The database approach – A database should be a natural representation of information as data, suitable for all relevant applications without duplication, including the ones you have not yet though of.

“A well designed database system will mirror its users’ perceptionsmirror its users’ perceptions of the problem space, and thus allows them to address the problem in hand without address the problem in hand without complexities and distractions of complexities and distractions of computer world implementation computer world implementation detailsdetails… Implicit is the notion that users should work within the bounds of ‘good ‘good practice’practice’””

Page 25: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

The semantic gap

Customer Salesperson

Take_byPlaced_by

Sales_Order

1

n m

1

C# Name …C1 Dr. EaglestoneC2 Ms Smith

SP# Names …S5 Mr. Chan …S8 Dr. Shao

C# SP# Product QuantityC1 S5 P99 120C1 S5 P2 10

Customer

Salesperson

SalesOrder

The gap between what you wish to represent and what you can represent.

Setting the scene – the database approach and state of the art.

Page 26: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

….. & Data Independence

Applications/UsersApplications/Users

External ModelExternal Model

Logical Model

Internal ModelInternal Model

Principles of database technology…

Page 27: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT graphs

7 2

N 1 2

3

5

4

6 Start M C

QJ

C 19 15

14

18

P(7)

END

20

P C

P(3)

Delay

QJ

QD

uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)

molsworth

"us army"

Page 28: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

A Ready-madeTemporal data modelling solution

Page 29: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

GENREG – A ready-made solution that has also been proposed for healthcare ?

The Organisation: National Museum of Denmark

Multimedia– Pictures as well as descriptions

Distributed– Each department ran their own database system

for their collection (ownership!) Object-oriented design

– Entities, not just values Relational implementation

Page 30: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Database Research

Science

Technology

Application

Praxis

Theory

Page 31: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

TopologyDanish Pre-history

Department of Antiquity

Ethnographic Department

Coin Collection

LAN

1,000,000 artefacts200,000 images

Page 32: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Design / Abstractions•Design

•Object oriented•Based on a curator’s perspective

•“Curators apply scientific training to determine the history of artefacts…creating knowledge about past and present societies by determining relationships which group artefacts within certain times and places in history”

•AbstractionsArtefactEventRelationship

•relate artefacts which participate in common events

Page 33: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Mould

usedto

fabricate

Brooches

Page 34: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

GENREG data model

ARTIFACT

EVENT/ARTIFACT

One (or more) artifactsparticipates

in one or more events.

Page 35: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Burial site

GraveGrave

ArtefactArtefactArtefactArtefact Artefact Artefact

Page 36: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

E

IH

F

DCB

A

G

LKJ

Merchant’s House

Manor House

Rooms

Furniture

Furniture

Purchase event

Page 37: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.
Page 38: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Integrated Care Pathways Application

[Procter, P., Eaglestone, B.M. & Burdis, C. “A unified model to support an information intensive healthcare environment, MIE

'99]

P1

P2

P6P3

P4 P5

It

It+2

It+1

It+2

It+1

Treatment

Alternative diagnoses

Alternative prognoses

Page 39: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

A formal GENREG Model

type Genreg = abs [tuple[ Collection : Artifacts, Events : set[Event]]new : () Genreg,= : (Genreg × Genreg) boolean,events : (Genreg) set[Event],collection : (Genreg) Artifacts]

type Artifacts = graph[Artifact]

type Event = abs[ tuple [id: E_Id, type : Exent_type, t : Time,place : Location, actors : set[Actor_Type], edge : set[Edge]]= : (Event × Event) boolean,id : (Event) E_Id,type : (Event) Event_Type,t : (Event) Time,place : (Event) Location,actors : (Event) set[Actor_Type],edgeset : (Event) set[Edges]]

Page 40: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

type Time = abs[tuple[ lower, upper: T]new : () Time,= : (Time × Time) boolean,before : (Time × Time) boolean,meets : (Time × Time) boolean,overlaps : (Time × Time) boolean,during : (Time × Time) boolean,starts : (Time × Time) boolean,finishes : (Time × Time) boolean,

Page 41: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

• add_artifact / delete_artifact (D, a)• add_event / delete_event (D, e)• merge (D,F,E)

• select_artefacts (D,p)• select_events (D,p)• related_to (D,n)• related_by (D,e,n)

Page 42: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Temporal Data Models(See also SQL/Temporal)

Entity

Att

ribut

e

Time Entity: Barry; Height: 5’ 10’’

Entity: Barry; Height: 2’ 3’’

Time: 2004

Time: 1950

Page 43: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

• Artefact histories are created retrospectively

• Multiple orthogonal time dimensions can be represented (using specialised events), e.g., discovery and historic time.

• Relationships between events and states are modelled.

• Multiple objects can represent different states and interpretations of an entity.

Page 44: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.
Page 45: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT graphs

7 2

N 1 2

3

5

4

6 Start M C

QJ

C 19 15

14

18

P(7)

END

20

P C

P(3)

Delay

QJ

QD

uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)

molsworth

"us army"

Q3

Q4

QJt

Page 46: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Some final thoughts…

Page 47: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Some final thoughts…

• The Database Approach?• Semantic gap?• Data independence?• Temporal modelling?• Query language?• So, what’s happening?

Page 48: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

IR & DB?

IR – collections of artefacts are available for ad hoc querying (any relevant problem) –

The problem is modelled by the query

DB – collections of artefacts are structured to model the problem space.

Server(s)Internet accessible

repositoriesof artefacts

Client(s)User are researchers

who derive knowledge fromretrieved artefacts

Problem-relatedQuery

Problem-relevantartefacts

Researcher’s workspace –Developed to model the

Problem spaceArtefact collection

Page 49: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

…final thoughts…

• Knowledge of research methodology is important (qualitative and quantitative)

• Nudist, Atlas, SPSS don’t support mixed methods

• Database approach allows integration of qualitative and quantitative data, and organisation of data to evolve to model emerging theory

• Temporal data models are key to modelling evolving strategy…

Page 50: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Acknowledgments

• The project team – Nigel Ford, Andrew Madden, Martin Whittle

• Arts and Humanities Research Council (formerly Board) for funding

• Mark Sanderson and Amanda Spink for making the Excite logs available

• Val Gillet and Eleanor Gardiner for help with graphs.

Page 51: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.
Page 52: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Summary

Feedback can lead to semantic changes

Complexity can be a hindrance

Searches don’t necessarily end when a searcher leaves a search engine.

Page 53: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Algorithm

Loop over session queries

Loop over previous queries

for i = 1 to n

for j = 1 to i-1

Compare query i with j

Choose most similar pair i,j

Analyse to assign QT type

i

j

1

n

time

Page 54: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Some Preliminary Observations

• Quote marks are likely to be used with a new query.

• Delay is strongly associated with N (New query): these are successful single queries within a session.

• B (Include Boolean) & C (Conjoint) are positively associated

• B & D (Disjoint) are negatively associated

Page 55: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Number of words/query: Excite 2001

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 10 100

terms/query

No

rmal

ised

fre

qu

ency

Page 56: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Classification of textual QT’s

• Word order, addition, subtraction.• Inclusion or removal of

– Boolean terms– “quotes”

• Detection of new enquiries.

• We use similarity methods to compare words and queries.

Page 57: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Self-selected searches

Prompts:• Think about the last time you had

trouble finding something you were looking for on the Internet.

• Do you have any hobbies or interests for which the Internet might provide useful information?

Page 58: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Hölsher & Strube (2000): Graphical

Representation

Close-up of direct interaction with a search engine: numbers show transition probabilities.

Experts and novicesdoing specificsearch tasks

Page 59: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Set searches

Heads:

What was written on Neville Chamberlain’s piece of paper?

You’ve won a holiday to Saga. What can you find out about the place that interests you?

Page 60: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Set searches

Tails:You’ve received a postcard from friends who say they’re visiting Map. Where are they? There are many opportunities to win things on the Internet. Can you find some that relate to your interests?

Additional search:Find the postcode of the tallest building in the UK outside of London.

Page 61: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

All searches recorded using

Spector pro (key stroke recorder) and My Screen Recorder (which records voice + activities on PC).

Page 62: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Annotated transcripts

Time at which stated action takes place.

Browse time preceding action

Search 1

00.50 “I might as well go with what I know best”

01.20 (enters ‘CD albums collection’)

01.27 (6s browse) Selects 2nd link (CD universe)

01.53 (31s browse) – selects Dance = 7 of ? (>24) (on LHS).

“See this is the trouble, cos I don’t really know what category it would go into. It was a mixed CD so it’s got all sorts of different things on, and there’s not really a category for that, I don’t think.”

01.56 (8s browse) – Selects Dance Collections = 7 of 12 (top of page)

Page 63: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Search dimensions

VolunteerSearch

no. On Off

On .

On+Off DepthIntensity:

Mean (s.d.)

1 1 2 1 0.67 1 43.33 (24.66)

2 10 8 0.56 2 14.72 (15.1)

3 6 5 0.55 3 12.27 (11.26)

4 3 1 0.75 1 7.5 (6.45)

2 1 30 14 0.68 6 4.55 (6.36)

2 22 8 0.73 2 7.67 (9.8)

3 8 1 0.89 1 13.33 (16.96)

4 24 2 0.92 1 6.73 (12.88)

Page 64: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Progress

ca54 volunteers observed since Oct 2005 (representing c200 searches).

Page 65: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

cf Transaction Logs

Internet searches are often regarded as being ‘shallow and promiscuous’ (=many short,simple searches).This idea supports the perception of searches viewed from search engine transaction logs. A useful summary of search engine use, but not of Web search behaviour viewed as a whole.

Page 66: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Feedback loops

Learn from previous searches

E.g. semantic shifts

Sheffield Pals Battalion

Richard Sparling

Page 67: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Complex search ≠ good search

Familiarity with search engine facilities (Boolean, “”, etc) does not always indicate competence.

E.g.: postcode "tallest building outside london" –london.

Page 68: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Use the general to find the specialist

Search engine used to find a more focussed search tool.

E.g. – searcher looking for info on B&B in York finds a directory of holiday accommodation.

Page 69: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

• Jansen ref re complexity

• Findings title

• Search dimensions slide

• Database side – modelling.

Page 70: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Previous studies of search logs

• Web search is shallow + promiscuous• Low use of advanced features• Global statistics

– number of queries/search– Pages viewed / user– query reformulation (change in no of terms)– Most users enter few terms– Little to be gained by increasing complexity

Page 71: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Strengths

• Large sample.• Natural environment.• Definitely general public.

• No enquiry context – what are they looking for? What are they thinking?

• No measure of success.• Are they searching or just browsing?• Where does one enquiry end and another begin?• Limited to one search engine – what did they do during a delay?

Limitations

Page 72: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Experimental Study

• Strengths– Very detailed information.– Searching not surfing.– Comparison of identical enquiries.

• Limitations– Small sample of queries.– Limited public sample – volunteers.

Page 73: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

This work

• Development of quantitative analysis

• Analysis of search logs (Excite 2001)

• Development of descriptive codes

• Aim is to form a basis for the analysis of our experimental data

Page 74: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Aims of Quantitative Analysis

• To look at textual (syntactic) changes.

• Link queries by text similarity.

• Infer enquiry change from textual dissimilarity.

• Use these elements to develop a machine-readable codification of QT’s.

Page 75: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Word similarity

667.087

10*2W

ba

cS

Drawback:On this measure doing and going are very similar (0.8)while bug and debugging have SW = 0.5

Dice Coefficient

e l e c t e d e l e c t i o n 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0

Shift

Page 76: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Word Similarity Threshold

dingping 75.0

8

6WS

bringthing

6.010

6WS

tryingstring

5.012

6WS

nursingtraining 4.0

15

6WS

•Partial solution: introduce threshold WST = 0.4•Anything less similar than WST is given SW = 0

Page 77: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Query Similarity

• For each word in query 1 find the most similar word in query 2 and combine results

• Accommodates repeated words (in query 2) without weighting

• Main point of WST is to avoid the accumulation of many small contributions to the query similarity

Page 78: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Query Similarity Example

leaf gelatin supplier barcelona

gelatine supplies in spain

Score = 0 Score = 0.93 Score = 0.88

Score = 0

wordsofnumber

scoresofsumS

maxQ Evaluate = 0.453

Page 79: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Query Similarity Threshold

We are looking for the most similar previous query to i

i

jtime

If none are similar maybe i isa new enquiry

Set QST =0.3 as lowest acceptable similarity for a valid query connection

Page 80: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Setting WST and QST

• Result narrowed down by close inspection

• In first 300 queries the set with WST = 0.4 and QMT = 0.3 agreed with a human analysis of the best categorisation in all cases bar one, which was in any case an unusual entry.

Page 81: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Algorithm

Loop over session queries

Loop over previous queries

for i = 1 to n

for j = 1 to i-1

Compare query i with j

Choose most similar pair i,j Assign k=j

Analyse to assign QT type k i

i

j

1

n

time

Page 82: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Code Transformation

U Unique

N New query

R Repeated query

P Page viewing (seek more)

p Page viewing (earlier pages)

“Trivial” Transformations

Page 83: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Substantive Transformations I

Code Transformation (relative to k)

I(k) Identical

J(k) Identical apart from Quotes/Boolean

C(k) Conjoint

D(k) Disjoint

S(k) Sub-phrase in common

s(k) Sub-phrase + words in common

Page 84: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Substantive Transformations II

Code Transformation (relative to k)

W(k) Single word in common

w(k) Separated single words in common

M(k) Other textual similarity

Below Threshold Similarity

Z(k) Not similar but word in common

z(k) Not similar but words in common

Page 85: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Target: one two three

Target: 123 Comparison Symbol Type

Basic transfomations 1234 C Conjunction 12 D Disjunction

Common sub-phrase 124 S Replacement 231 s Reordering 1243 s Insertion/removal

Common word 145 W Replacement 132 w Reordering 143 w Repacement/insertion

Below threshold similarity 1456 Z Common word 1245678 z Common phrase

Page 86: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Code Transformation

B Include Boolean term

b Remove Boolean term

Q Include quote marks

q Remove quote marks

_ Delay > 1 hour

Supplementary Transformations

Page 87: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Example full transformation

May include up to 4 terms e.g.

BQC(4)_Boolean

Quote Marks

Substantive Delay

Page 88: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Some examples Code Query1 Query2 QJ(k) bargain music “bargain music” QC(k) Bacteremia “Pneumoccol Bacteremia” qJ(k) “university of texas”

“alternative medicine” university of texas” “alternative medicine”

qw(k) "tax law_depreciation system"

tax law/depreciation system

BC(k) "the sopranos" "the sopranos" +scripts BJ(k) +"Complaint form letters"

Insurance +"Complaint form letters" +Insurance

BS(k) doppler effect labs doppler effect +lab

Page 89: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

More examples Code Query1 Query2 Bs(k) conferences image processing +image +processing

+conferences +finland BqW(k) "Craig Larman" +Larman +Valtech BqZ(k) +"lbp 1000" +review +canon +review +laser

+printer BqW(k) Hevia AND bagpipe "Spanish bagpipe" bQs(k) +used +horse +trailer +arndt +"horse trailer" used bqW(k) +arndt +"horse trailer" used +Arndt trailer bqs(k) +Moby +southside +"Gwen

Stefani" +mp3 +Moby +southside +mp3

Page 90: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Output for thefirst 100

Excite queries

Source file: excite.txt word modification threshold : 0.400000 query modification level : 0.300000 sub-session delay/s : 3600 qid0 uid nq Modification list 1 1 ** 1 U 2 2 ** 5 NW(1)_NPP 7 3 ** 4 NS(1)PP 11 4 ** 1 U 12 5 ** 1 U 13 6 ** 1 U 14 7 ** 5 N_QNPPP 19 8 ** 4 NPPP 23 9 ** 1 U 24 10 ** 4 NQJ(1)NQN 28 11 ** 5 N_NN_NP 33 12 ** 2 N_N 35 13 ** 3 NR_R 38 14 ** 1 U 39 15 ** 1 U 40 16 ** 4 NM(1)RN 44 17 ** 21 N_N_NC(1)PPPPNW(9)PPPPC(10)PPPPPP 65 18 ** 2 NP 67 19 ** 10 NRPC(1)RP_NS(7)D(7)I(7) 77 20 ** 1 QU 78 21 ** 1 U 79 22 ** 1 U 80 23 ** 1 U 81 24 ** 1 U 82 25 ** 1 QU 83 26 ** 11 N_NC(2)PPPPW(3)NC(9)P 94 27 ** 5 NNW(2)RR 99 28 ** 1 U 100 29 ** 3 NW(1)_M(1)

N_NC(2)PPPPW(3)NC(9)P

Page 91: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

One session - 3 sub-sessions

qid uid time rank query querymore totwords

83 000000000000001a 083122 0 chicago sun times No 3

84 000000000000001a 105439 0 f8 No 1

85 000000000000001a 105453 0 f8 airplane No 2

86 000000000000001a 105536 10 f8 airplane No 2

87 000000000000001a 105614 20 f8 airplane No 2

88 000000000000001a 105630 30 f8 airplane No 2

89 000000000000001a 105731 40 f8 airplane No 2

90 000000000000001a 105740 0 airplanes f8 No 2

91 000000000000001a 113441 0 ceo compensation No 2

92 000000000000001a 113633 0 2000 ceo compensation No 3

93 000000000000001a 113752 10 2000 ceo compensation No 3

1 N_

2 N

3 C(2)

4 P

5 P

6 P

7 P

8 W(3)

9 N

10 C(9)

11 P

Page 92: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Query lengths

1

10

100

1000

10000

100000

1000000

1 10 100

Length/Queries

Fre

quen

cy

sessions sub-session

10% of sub-sessionsare at least 7 queries in length

Page 93: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT relative frequencies

0

5

10

15

20

25

30

35

U N P p R I J C D S s W w M Z z B b Q q _Query Transformation

Per

cant

age

Fre

quen

cy

Page 94: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Terminal QT’s

0

0.2

0.4

0.6

0.8

1

1.2

U N P p R I J C D S s W w M Z z B b Q q _

Query Transformation

Term

ina

l QT

ra

tio

)(QTFreq

QTFinalFreqRatio

i.e.: The lastqueries in a sub-session

Page 95: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT graphs

N 1 2 3 5 4 6 Start

M C C s

22

23

25

26

27

s

s

s

S s

24

28

RP(14)

END

s

s

20

29

R

s

21 5

uid 74: NM(1)C(2)C(3)S(4)s(5)PPRPRRRRPPRRppI(5)s(6)s(22)s(22)s(23)s(25)s(26)s(22)R

nursing careers

paid undergraduate nursing schools in baltimore city maryland

Page 96: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT graphs

7 2

N 1 2

3

5

4

6 Start M C

QJ

C 19 15

14

18

P(7)

END

20

P C

P(3)

Delay

QJ

QD

uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)

molsworth

"us army"

Page 97: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Frequency of nodes with k connections

0

2

4

6

8

10

12

0 2 4 6 8 10k

ln(f

)

Query length 10

Query length 20

Slope = -1

Exponential scaling

Page 98: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Intra-QT correlations

• f (A,B) measured coincident frequency of codes A and B

• E{} Expected value• V{} Variance

ij

ijijijf

AAfV

AAfEAAfAAD

,

,,,

Correlations within a transform e.g. [BQC(3)_]

Page 99: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Intra-QT correlations

Type B b Q q –— U 20.60 – 1.32 – – N -1.48 – 23.26 – 78.27 P – – – – -66.16 p – – – – -9.63 R – – – – 10.53 I – – – – 4.45 J 61.85 47.37 136.42 78.37 -5.74 C 46.02 -42.81 -15.14 -19.22 -4.70 D -34.07 62.20 -15.09 13.45 -4.79 S -24.52 -11.14 -20.69 -7.63 -5.65 s -2.62 9.93 -7.05 3.65 -8.04 W -35.00 -10.35 -32.99 -6.81 -6.05 w -2.63 9.14 -11.51 -0.98 -8.18 M -21.05 -12.98 -37.31 -13.28 -1.97 Z -2.26 14.11 -10.06 2.23 -0.90 z 1.78 2.82 0.55 1.45 0.95 B 0.00 – 1.16 76.78 -15.01 b – 0.00 74.95 10.05 -11.07 Q 1.16 74.95 0.00 – -0.28 q 76.78 10.05 – 0.00 -7.77 — -15.01 -11.07 -0.28 -7.77 0.00

Example:

[BQC(3)_]

Page 100: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Some Observations

• Quote marks are likely to be used with a new query.

• Delay is strongly associated with N: these are successful single queries within a session.

• B & C are positively associated• B & D are negatively associated

Page 101: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Application to Experimental Results

Page 102: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Query Transforms

qid SS Query QM(similarity) QM(preceeding)1 * CD albums collection N N2 CD albums collection R R3 * Autotrader N N4 * atlas N N5 * place names N N6 place names R R7 * map N N8 * online competitions N N9 * Tall British buildings N N10 Tall buildings w(9) w(9)11 Tall buildings R R12 Tall buildings R R13 Tall buildings in Britain w(9) C(12)14 Tallest building outside London M(9) M(13)

Page 103: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Temporal Database•A repository of all data for each session•Accessible to SQL•Used to build evidence-based models for searching

Background detailsWeb experienceCognitive style scores

Subjects appraisalof searches

uid

Search queriesWeb page titles

uid

Key stroke recordActivity timings

Query modificationcodes

qidqid

Qualitative analysis

Page 104: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Acknowledgments

• Arts and Humanities Research Council (formerly Board) for funding

• Mark Sanderson and Amanda Spink for making the Excite logs available

Page 105: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Questions ?

Page 106: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Setting WST and QST

excite: WST = 0.4

0

50000

100000

150000

200000

250000

300000

350000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Query Transformation

Fre

qu

ency

Tot New

Tot Mod

z+Z

Page 107: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Inter-QT correlations

• f ( A | B ) measured frequency of codes B following A

• E{} Expected value• V{} Variance

ij

ijijijf

ABfV

ABfEABfABD

|

|||

Correlations of one transform with the next.

Page 108: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Inter-QT correlations

Prior Transformation Type N P p R I J C D S s W w M Z z B b Q q —

N 82.40 -39.20 -2.92 13.26 22.95 2.22 10.77 9.92 5.92 -2.37 23.22 2.37 30.55 11.24 3.99 22.86 8.45 17.84 6.60 102.17

P -42.39 323.03 9.91 -15.98 -17.58 -9.12 -4.10 -6.83 -12.90 -5.45 -19.89 -5.75 -32.02 -8.76 -2.25 -25.01 -19.35 -18.59 -7.81 -71.47

p -50.08 79.89 154.30 17.11 4.96 -8.42 -18.06 -10.74 -15.30 -10.98 -21.52 -11.35 -18.32 -8.35 -2.35 -21.79 -10.70 -17.30 -7.25 21.57

R 125.10 -85.27 3.73 198.05 23.30 -2.83 0.55 -2.51 -3.93 -6.24 1.94 -3.17 14.86 1.31 -0.46 -16.30 -12.24 -0.72 -6.71 89.80

I -8.96 -39.39 7.11 25.19 152.36 23.27 35.60 20.45 19.44 10.92 33.41 15.91 61.29 5.88 1.04 0.33 6.43 -0.72 4.76 61.21

J 31.31 -28.13 0.42 -1.56 -2.36 45.43 29.05 12.92 21.68 19.21 15.47 15.55 10.37 7.08 4.06 66.72 37.31 70.63 46.88 -5.89

C 98.65 -27.61 -2.25 -7.92 -3.51 9.43 50.98 -1.42 2.57 -5.27 11.76 -2.43 7.80 10.78 1.98 33.37 6.34 25.51 3.16 -8.53

D 39.12 -24.03 -2.58 -3.66 -0.82 23.95 14.41 21.89 32.39 29.83 26.52 21.93 -4.62 11.31 4.55 45.21 24.60 57.86 14.67 5.58

S 35.67 -30.46 -3.62 -7.55 0.35 12.88 31.20 28.48 108.55 44.56 27.07 25.89 -6.91 26.54 5.79 56.24 35.14 39.28 17.90 6.90

s 8.44 -18.69 -2.58 -6.79 -1.78 15.49 43.13 15.71 59.83 117.15 1.57 34.34 -12.48 30.55 21.59 46.67 34.77 33.33 22.27 1.00

W 79.54 -43.79 -5.10 -9.05 4.91 15.72 16.39 32.98 10.95 -0.93 117.56 23.20 24.22 14.02 -0.47 70.07 38.85 46.57 17.34 27.82

w 17.74 -17.47 -2.16 -5.35 2.10 12.61 23.19 16.82 22.55 23.51 44.17 66.50 -2.25 18.13 3.57 39.50 35.21 26.21 14.42 6.21

M 109.09 -57.39 -6.00 0.68 8.81 4.55 -5.14 7.04 -11.05 -11.98 4.69 -7.25 160.36 -3.45 -2.86 31.61 14.40 9.17 4.19 31.52

Z 37.56 -13.24 -3.22 -0.98 1.32 6.09 9.11 5.53 17.10 13.88 5.76 5.96 -2.27 19.33 3.01 29.60 10.64 12.79 6.22 30.99 z 9.83 -4.61 0.69 0.25 -0.56 2.35 2.28 -0.82 7.06 8.53 -0.52 3.29 -2.42 8.85 20.34 12.08 4.22 4.48 2.57 4.33

B 61.06 -42.37 3.02 -0.11 -3.05 56.39 36.12 14.63 22.86 19.43 33.25 19.98 23.90 14.39 4.25 204.51 70.57 72.24 51.54 0.67

b 38.59 -32.48 -8.39 -14.33 -4.12 50.59 17.99 24.07 35.23 41.38 27.86 27.48 12.74 19.47 9.57 247.85 145.67 44.35 48.16 4.51

Q 35.97 -24.81 -5.29 -9.96 -3.80 112.76 21.46 12.99 19.11 17.75 23.45 15.62 7.47 8.74 2.70 81.08 67.37 126.97 50.84 5.15

q 18.26 -22.93 -2.71 -5.39 -0.10 54.20 17.40 22.01 23.42 28.37 23.34 14.49 6.52 7.45 3.91 41.28 40.34 173.97 135.55 5.06

Pos

terio

r T

rans

form

atio

n

— 54.44 -16.60 0.96 28.90 14.56 0.59 9.51 3.69 7.01 0.35 9.44 1.59 11.49 3.65 0.14 0.87 -1.84 4.33 -0.49 65.46

Example: [BQC(3)_][bqD(5)]

Page 109: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Some Observations

• Self-correlations suggest habitual tendencies

• Substantive QT’s rarely follow or precede page-viewing. They are associated with active searching.

• Delay is followed by N, a new query or R or I – suggesting memory refresh.

Page 110: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Number of words/query: Excite 2001

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 10 100

terms/query

No

rmal

ised

fre

qu

ency

Page 111: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Hölsher & Strube (2000): Graphical

Representation

Close-up of direct interaction with a search engine: numbers show transition probabilities.

Experts and novicesdoing specificsearch tasks

Page 112: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Word Similarity

e l e c t e d e l e c t i o n 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Shift word along until the best match is found

e l e c t e d e l e c t i o n 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

logical AND: same letter

Page 113: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Motivation

• Need to develop new models for searching: update outdated usage paradigms.

• Improve training methods

• Develop automated assistance systems

Page 114: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Context

• How do the general public search the web?

• Experimental study– general public volunteers– record sound, screens, keystrokes

• Goal: evidence-based model of effective searching

Page 115: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Previous studies of search logs

• Web search is shallow + promiscuous• Low use of advanced features• Global statistics

– number of queries/search– Pages viewed / user– query reformulation (change in no of terms)– Most users enter few terms– Little to be gained by increasing complexity

Page 116: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

This work

• Development of quantitative analysis

• Analysis of search logs (Excite 2001)

• Development of descriptive codes

• Aim is to form a basis for the analysis of our experimental data

Page 117: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Aims of Quantitative Analysis

• To look at textual (syntactic) changes.

• Link queries by text similarity.

• Infer enquiry change from textual dissimilarity.

• Use these elements to develop a machine-readable codification of QT’s.

Page 118: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Target: one two three

Target: 123 Comparison Symbol Type

Basic transfomations 1234 C Conjunction 12 D Disjunction

Common sub-phrase 124 S Replacement 231 s Reordering 1243 s Insertion/removal

Common word 145 W Replacement 132 w Reordering 143 w Repacement/insertion

Below threshold similarity 1456 Z Common word 1245678 z Common phrase

Page 119: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Code Transformation

B Include Boolean term

b Remove Boolean term

Q Include quote marks

q Remove quote marks

_ Delay > 1 hour

Supplementary Transformations

Page 120: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Example full transformation

May include up to 4 terms e.g.

BQC(4)_Boolean

Quote Marks

Substantive Delay

Page 121: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Some examples Code Query1 Query2 QJ(k) bargain music “bargain music” QC(k) Bacteremia “Pneumoccol Bacteremia” qJ(k) “university of texas”

“alternative medicine” university of texas” “alternative medicine”

qw(k) "tax law_depreciation system"

tax law/depreciation system

BC(k) "the sopranos" "the sopranos" +scripts BJ(k) +"Complaint form letters"

Insurance +"Complaint form letters" +Insurance

BS(k) doppler effect labs doppler effect +lab

Page 122: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

More examples Code Query1 Query2 Bs(k) conferences image processing +image +processing

+conferences +finland BqW(k) "Craig Larman" +Larman +Valtech BqZ(k) +"lbp 1000" +review +canon +review +laser

+printer BqW(k) Hevia AND bagpipe "Spanish bagpipe" bQs(k) +used +horse +trailer +arndt +"horse trailer" used bqW(k) +arndt +"horse trailer" used +Arndt trailer bqs(k) +Moby +southside +"Gwen

Stefani" +mp3 +Moby +southside +mp3

Page 123: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Output for thefirst 100

Excite queries

Source file: excite.txt word modification threshold : 0.400000 query modification level : 0.300000 sub-session delay/s : 3600 qid0 uid nq Modification list 1 1 ** 1 U 2 2 ** 5 NW(1)_NPP 7 3 ** 4 NS(1)PP 11 4 ** 1 U 12 5 ** 1 U 13 6 ** 1 U 14 7 ** 5 N_QNPPP 19 8 ** 4 NPPP 23 9 ** 1 U 24 10 ** 4 NQJ(1)NQN 28 11 ** 5 N_NN_NP 33 12 ** 2 N_N 35 13 ** 3 NR_R 38 14 ** 1 U 39 15 ** 1 U 40 16 ** 4 NM(1)RN 44 17 ** 21 N_N_NC(1)PPPPNW(9)PPPPC(10)PPPPPP 65 18 ** 2 NP 67 19 ** 10 NRPC(1)RP_NS(7)D(7)I(7) 77 20 ** 1 QU 78 21 ** 1 U 79 22 ** 1 U 80 23 ** 1 U 81 24 ** 1 U 82 25 ** 1 QU 83 26 ** 11 N_NC(2)PPPPW(3)NC(9)P 94 27 ** 5 NNW(2)RR 99 28 ** 1 U 100 29 ** 3 NW(1)_M(1)

N_NC(2)PPPPW(3)NC(9)P

Page 124: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

One session - 3 sub-sessions

qid uid time rank query querymore totwords

83 000000000000001a 083122 0 chicago sun times No 3

84 000000000000001a 105439 0 f8 No 1

85 000000000000001a 105453 0 f8 airplane No 2

86 000000000000001a 105536 10 f8 airplane No 2

87 000000000000001a 105614 20 f8 airplane No 2

88 000000000000001a 105630 30 f8 airplane No 2

89 000000000000001a 105731 40 f8 airplane No 2

90 000000000000001a 105740 0 airplanes f8 No 2

91 000000000000001a 113441 0 ceo compensation No 2

92 000000000000001a 113633 0 2000 ceo compensation No 3

93 000000000000001a 113752 10 2000 ceo compensation No 3

1 N_

2 N

3 C(2)

4 P

5 P

6 P

7 P

8 W(3)

9 N

10 C(9)

11 P

Page 125: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Query lengths

1

10

100

1000

10000

100000

1000000

1 10 100

Length/Queries

Fre

quen

cy

sessions sub-session

10% of sub-sessionsare at least 7 queries in length

Page 126: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT relative frequencies

0

5

10

15

20

25

30

35

U N P p R I J C D S s W w M Z z B b Q q _Query Transformation

Per

cant

age

Fre

quen

cy

Page 127: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Terminal QT’s

0

0.2

0.4

0.6

0.8

1

1.2

U N P p R I J C D S s W w M Z z B b Q q _

Query Transformation

Term

ina

l QT

ra

tio

)(QTFreq

QTFinalFreqRatio

i.e.: The lastqueries in a sub-session

Page 128: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT graphs

N 1 2 3 5 4 6 Start

M C C s

22

23

25

26

27

s

s

s

S s

24

28

RP(14)

END

s

s

20

29

R

s

21 5

uid 74: NM(1)C(2)C(3)S(4)s(5)PPRPRRRRPPRRppI(5)s(6)s(22)s(22)s(23)s(25)s(26)s(22)R

nursing careers

paid undergraduate nursing schools in baltimore city maryland

Page 129: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

QT graphs

7 2

N 1 2

3

5

4

6 Start M C

QJ

C 19 15

14

18

P(7)

END

20

P C

P(3)

Delay

QJ

QD

uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)

molsworth

"us army"

Page 130: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Frequency of nodes with k connections

0

2

4

6

8

10

12

0 2 4 6 8 10k

ln(f

)

Query length 10

Query length 20

Slope = -1

Exponential scaling

Page 131: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Intra-QT correlations

• f (A,B) measured coincident frequency of codes A and B

• E{} Expected value• V{} Variance

ij

ijijijf

AAfV

AAfEAAfAAD

,

,,,

Correlations within a transform e.g. [BQC(3)_]

Page 132: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Intra-QT correlations

Type B b Q q –— U 20.60 – 1.32 – – N -1.48 – 23.26 – 78.27 P – – – – -66.16 p – – – – -9.63 R – – – – 10.53 I – – – – 4.45 J 61.85 47.37 136.42 78.37 -5.74 C 46.02 -42.81 -15.14 -19.22 -4.70 D -34.07 62.20 -15.09 13.45 -4.79 S -24.52 -11.14 -20.69 -7.63 -5.65 s -2.62 9.93 -7.05 3.65 -8.04 W -35.00 -10.35 -32.99 -6.81 -6.05 w -2.63 9.14 -11.51 -0.98 -8.18 M -21.05 -12.98 -37.31 -13.28 -1.97 Z -2.26 14.11 -10.06 2.23 -0.90 z 1.78 2.82 0.55 1.45 0.95 B 0.00 – 1.16 76.78 -15.01 b – 0.00 74.95 10.05 -11.07 Q 1.16 74.95 0.00 – -0.28 q 76.78 10.05 – 0.00 -7.77 — -15.01 -11.07 -0.28 -7.77 0.00

Example:

[BQC(3)_]

Page 133: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Application to Experimental Results

Page 134: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Query Transforms

qid SS Query QM(similarity) QM(preceeding)1 * CD albums collection N N2 CD albums collection R R3 * Autotrader N N4 * atlas N N5 * place names N N6 place names R R7 * map N N8 * online competitions N N9 * Tall British buildings N N10 Tall buildings w(9) w(9)11 Tall buildings R R12 Tall buildings R R13 Tall buildings in Britain w(9) C(12)14 Tallest building outside London M(9) M(13)

Page 135: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Temporal Database•A repository of all data for each session•Accessible to SQL•Used to build evidence-based models for searching

Background detailsWeb experienceCognitive style scores

Subjects appraisalof searches

uid

Search queriesWeb page titles

uid

Key stroke recordActivity timings

Query modificationcodes

qidqid

Qualitative analysis

Page 136: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Conclusions

• We have developed a rich set of codes describing syntactic part of QT’s

• These can be used to develop a graph-based description

• Correlations between the codes are meaningful/interesting

• They will form part of the analysis for our experimental study.

Page 137: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Acknowledgments

• Arts and Humanities Research Council (formerly Board) for funding

• Mark Sanderson and Amanda Spink for making the Excite logs available

Page 138: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Questions ?

Page 139: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Setting WST and QST

excite: WST = 0.4

0

50000

100000

150000

200000

250000

300000

350000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Query Transformation

Fre

qu

ency

Tot New

Tot Mod

z+Z

Page 140: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Inter-QT correlations

• f ( A | B ) measured frequency of codes B following A

• E{} Expected value• V{} Variance

ij

ijijijf

ABfV

ABfEABfABD

|

|||

Correlations of one transform with the next.

Page 141: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Inter-QT correlations

Prior Transformation Type N P p R I J C D S s W w M Z z B b Q q —

N 82.40 -39.20 -2.92 13.26 22.95 2.22 10.77 9.92 5.92 -2.37 23.22 2.37 30.55 11.24 3.99 22.86 8.45 17.84 6.60 102.17

P -42.39 323.03 9.91 -15.98 -17.58 -9.12 -4.10 -6.83 -12.90 -5.45 -19.89 -5.75 -32.02 -8.76 -2.25 -25.01 -19.35 -18.59 -7.81 -71.47

p -50.08 79.89 154.30 17.11 4.96 -8.42 -18.06 -10.74 -15.30 -10.98 -21.52 -11.35 -18.32 -8.35 -2.35 -21.79 -10.70 -17.30 -7.25 21.57

R 125.10 -85.27 3.73 198.05 23.30 -2.83 0.55 -2.51 -3.93 -6.24 1.94 -3.17 14.86 1.31 -0.46 -16.30 -12.24 -0.72 -6.71 89.80

I -8.96 -39.39 7.11 25.19 152.36 23.27 35.60 20.45 19.44 10.92 33.41 15.91 61.29 5.88 1.04 0.33 6.43 -0.72 4.76 61.21

J 31.31 -28.13 0.42 -1.56 -2.36 45.43 29.05 12.92 21.68 19.21 15.47 15.55 10.37 7.08 4.06 66.72 37.31 70.63 46.88 -5.89

C 98.65 -27.61 -2.25 -7.92 -3.51 9.43 50.98 -1.42 2.57 -5.27 11.76 -2.43 7.80 10.78 1.98 33.37 6.34 25.51 3.16 -8.53

D 39.12 -24.03 -2.58 -3.66 -0.82 23.95 14.41 21.89 32.39 29.83 26.52 21.93 -4.62 11.31 4.55 45.21 24.60 57.86 14.67 5.58

S 35.67 -30.46 -3.62 -7.55 0.35 12.88 31.20 28.48 108.55 44.56 27.07 25.89 -6.91 26.54 5.79 56.24 35.14 39.28 17.90 6.90

s 8.44 -18.69 -2.58 -6.79 -1.78 15.49 43.13 15.71 59.83 117.15 1.57 34.34 -12.48 30.55 21.59 46.67 34.77 33.33 22.27 1.00

W 79.54 -43.79 -5.10 -9.05 4.91 15.72 16.39 32.98 10.95 -0.93 117.56 23.20 24.22 14.02 -0.47 70.07 38.85 46.57 17.34 27.82

w 17.74 -17.47 -2.16 -5.35 2.10 12.61 23.19 16.82 22.55 23.51 44.17 66.50 -2.25 18.13 3.57 39.50 35.21 26.21 14.42 6.21

M 109.09 -57.39 -6.00 0.68 8.81 4.55 -5.14 7.04 -11.05 -11.98 4.69 -7.25 160.36 -3.45 -2.86 31.61 14.40 9.17 4.19 31.52

Z 37.56 -13.24 -3.22 -0.98 1.32 6.09 9.11 5.53 17.10 13.88 5.76 5.96 -2.27 19.33 3.01 29.60 10.64 12.79 6.22 30.99 z 9.83 -4.61 0.69 0.25 -0.56 2.35 2.28 -0.82 7.06 8.53 -0.52 3.29 -2.42 8.85 20.34 12.08 4.22 4.48 2.57 4.33

B 61.06 -42.37 3.02 -0.11 -3.05 56.39 36.12 14.63 22.86 19.43 33.25 19.98 23.90 14.39 4.25 204.51 70.57 72.24 51.54 0.67

b 38.59 -32.48 -8.39 -14.33 -4.12 50.59 17.99 24.07 35.23 41.38 27.86 27.48 12.74 19.47 9.57 247.85 145.67 44.35 48.16 4.51

Q 35.97 -24.81 -5.29 -9.96 -3.80 112.76 21.46 12.99 19.11 17.75 23.45 15.62 7.47 8.74 2.70 81.08 67.37 126.97 50.84 5.15

q 18.26 -22.93 -2.71 -5.39 -0.10 54.20 17.40 22.01 23.42 28.37 23.34 14.49 6.52 7.45 3.91 41.28 40.34 173.97 135.55 5.06

Pos

terio

r T

rans

form

atio

n

— 54.44 -16.60 0.96 28.90 14.56 0.59 9.51 3.69 7.01 0.35 9.44 1.59 11.49 3.65 0.14 0.87 -1.84 4.33 -0.49 65.46

Example: [BQC(3)_][bqD(5)]

Page 142: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Some Observations

• Self-correlations suggest habitual tendencies

• Substantive QT’s rarely follow or precede page-viewing. They are associated with active searching.

• Delay is followed by N, a new query or R or I – suggesting memory refresh.

Page 143: Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk.

Number of words/query: Excite 2001

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 10 100

terms/query

No

rmal

ised

fre

qu

ency


Recommended