Centre for Content and Knowledge Engineering, Utrecht University, the Netherlands

Query Formulation for XMLRetrieval with Bricks

Bricks: , the building blocks to tackle queryformulation issues in structured document retrieval

Roelof van Zwol

Jeroen Baas

Herre van Oostendorp

Frans Wiering

OutlineMotivationObjectives with BricksTheory behind BricksUsability experimentConclusions

MotivationStructured document retrieval (XML retrieval)is different:

Query formulationSearch can contain both structural and textual conditions.

Retrieval strategyExploit the document structure to retrieve relevantdocument fragments.

Result presentationPresent individual document fragments, or clusteredfragments (browse and fetch), requires new navigationtechniques.

Running example:A user planning his holiday, has thefollowing specific information need:

“Find information about traveling todestinations with major airports, where theweather has a tropical climate.”

Based on the Lonely Planet collection.

Legend:Structural conditions

Textual conditions

Query formulation


Keyword based (NEXI-CO)

Natural Language Processing

Combination of keyword- and structure-based search (NEXI-CAS)


Query formulationNEXI-Content Only:

Keyword-based query, containing words andphrases.User is ignorant of document structure.Any document or XML document fragmentcan be returned.

Information request:“major airport” destination weather “tropicalclimate”

Query formulationNatural language processing:

Complex formulation, including structure is very wellpossible in theory.Close to user’s “mental model”.

User needs to know about structure.

Brisbane’s NLP2NEXI engine:Kindly provided by [Geva et al.]We ran into conversion problems, with respect to recursivestructure, and generation of complex NEXI-queries.

Therefore not included into experiment.

Information request:Find information about traveling to destinations with majorairports, where the weather has a tropical climate.

Query formulationNEXI - Content and Structure:

Query contains structural requirements.User is aware of document structure, and …User is capable to specify structural constraints.

Powerful query language for structured documentretrieval.

Information request://destination[about(.//weather, “tropical climate”)] //getting_there_and_away[about(., “major airport”]

Query formulation

Task Complexity 71












Bricks (?)

Natural LanguageProcessing

Objectives with BricksMinimize the complexity of the queryformulation process

Minimize the required knowledge of thedocument structure

Maximize the expression power asprovided by NEXI

Running example in bricks

Theory behind BricksGraphical approachIntuition of a mental modelBuilding blocksAvoid information overload

Graphical approachReduces syntactical formulation issues.

Bricks is NEXI-compatible.

Reduces/eliminates knowledge of documentstructure.

Bricks uses of pull-down lists.

Alternative in development: TreeSearch.

Avoids formulation of malformed informationrequests.

Bricks uses extensive checks for both query syntaxand structure.

Referred to in literature as: Direct manipulation [Preece et al.]

Intuition of a mental modelA user has a mental model of the informationhe is looking for.

Effectivity and efficiency of user increaseswhen the formulation process is close to theuser’s mental model.

User thinks in ‘natural language’…… and is likely to specify the requested element ofretrieval first:

“Find information about traveling to…”

… NEXI-spefication is not user oriented, butstructure oriented…

Building blocks

In Brick, the formulation process is splitinto small building blocks, that a userneeds to complete, to specify hisinformation need.

Similar theory as used for form-wizards.

Avoid information overload

The risk of too many options…

Syntax: allow a minimal (logical) set of nextsteps.

Structure: present a limited (logical) set ofstructural elements. (LP: 271 unique elements)






<p> <b> <i>



Usability experimentHypothesisSetupResultsTask complexityObservations

Use of sophisticated query formulationtechniques will lead to a higher effectivenessof the task performance.

NEXI-CO and Bricks should perform significantlybetter than NEXI-CO, with respect to successfultask completion.

The difference in effectiveness is dependant onthe task complexity.

The Bricks approach for query formulationwill increase the efficiency of the user for agiven task.

When time is taken into account, with respect toeffectivity, we expect that users will needsignificantly less time to (successfully) completethe task, when using NEXI-CO and Bricks.

SetupDocument collection: Lonely Planet.

Systems: NEXI-CO, NEXI-CAS, and Bricks.

Users: 54 MIR-students, trained in NEXI andSDR.

Topics: 27 topics, sorted in three complexitygroups.

Survey: prior and after the experiment theparticipants filled in a survey (satisfaction)

Experience: Before the experiment the userspracticed with the TERS-interface and searchengines. (reduces learning effects)

Overall results

Effectivity: significant diff. between Brick and NEXI-CAS vs. NEXI-CO. (H1)

Efficiency: Bricks most efficient. (sign. diff.) (H2)

Time: users need significantly more time to formulateinformation need with NEXI-CAS.

Satisfaction: no significant difference, but NEXI-CASpreferred, followed by Bricks.









Task complexity

Task complexity

Task complexity

Completely different working procedures:

NEXI-CO: formulate query, inspect resultsand refine. (many iteration steps)

NEXI-CAS: step-wise construction andvalidation of NEXI-query, submissions tocheck syntax (few iteration steps)

Bricks: Longer construction time, almost noiterations, submit to check results.

ConclusionsUser need to be capable to adequately use thestructure of a document, to make SDR work inpractice.

Three objectives for query formulation:Adequate expression power

Minimize syntactical formulation problems

Minimize required knowledge of document structure

Bricks: graphical approach, intuition of mentalmodel, building blocks, avoid informationoverload. (Online-demo available)

Sophisticated query formulation techniques have apositive influence of task performance (effectivenessof NEXI-CAS and Bricks)

Bricks is more efficient, since it allows the user tosuccessfully perform their task in a shorter amountof time.

Task complexity vs. effectivity: negative correlation,but Bricks and NEXI-CAS are more effective for mid-and highly complex task.

Task complexity vs. efficiency: strong negativecorrelation.

NEXI-CAS is sensitive with respect to time needed for taskcompletion.

