Advanced Query Parsing Techniques

Post on 18-Dec-2014

334 views 1 download

description

This presentation given at the November 2013 Basis Technologies' Open Source Search Conference, reviews the role that advanced query parsing can play in building systems including: relevancy customization, taking input from user interface variables, such as the position on a website or geographical indicators, which sources are to be searched, and third party data sources. Query parsing can also enhance data security. Best practices for building and maintaining complex query parsing rules will be discussed and illustrated. http://www.searchtechnologies.com/query-parsing-language.html

transcript

Advanced Query Parsing Techniques

Aruna Kumar Pamulapati (Arun)Technical Consultant

2 The expert in the search space

Search Technologies Overview

Formed June 2005Over 100 employees and growingOver 500 customers worldwidePresence in US, Latin America, UK & GermanyDeep enterprise search expertiseConsistent revenue growth and profitabilitySearch Engine Independent

3 The expert in the search space

Lucene Relevancy: Simple Operators

term(A) TF(A) * IDF(A)Implemented with DefaultSimilarity / TermQueryTF(A) = sqrt(termInDocCount)IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0

and(A,B) A * BImplemented with BooleanQuery()

or(A, B) A + BImplemented with BooleanQuery()

max(A, B) max(A, B)Implemented with DisjunctionMaxQuery()

4 The expert in the search space

Simple Operators - Example

and

or max

george martha washington custis

0.10 0.20 0.60 0.90

0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90

0.3 * 0.9 = 0.27

5 The expert in the search space

Less Used Operators

boost(f, A) (A * f)Implemented with Query.setBoost(f)

constant(f, A) if(A) then f else 0.0Implemented with ConstantScoreQuery()

boostPlus(A, B) if(A) then (A + B) else 0.0Implemented with BooleanQuery()

boostMul(f, A, B) if(B) then (A * f) else AImplemented with BoostingQuery()

6 The expert in the search space

Problem: Need for More Flexibility

Difficult / impossible to use all operatorsMany not available in standard query parsers

Complex expressions = string manipulationThis is messy

Query construction is in the application layerYour UI programmer is creating query expressions?Seriously?

Hard to create and use new operatorsRequires modifying query parsers - yuck

7 The expert in the search space

Query Processing Language

Solr

UserInterface

QPLEngine Search

QPLScript

8 The expert in the search space

Introducing: QPL

Query Processing LanguageDomain Specific Language for Constructing QueriesBuilt on Groovyhttps://wiki.searchtechnologies.com/index.php/QPL_Home_Page

Solr Plug-InsQuery ParserSearch Component

“The 4GL for Text Search Query Expressions”Server-side Solr Access

Cores, Analyzers, Embedded Search, Results XML

9 The expert in the search space

Solr Plug-Ins

10 The expert in the search space

QPL Configuration – solrconfig.xml

<queryParser name="qpl" class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin"> <str name="scriptFile">parser.qpl</str> <str name="defaultField">text</str></queryParser>

<searchComponent name="qplSearchFirst" class="com.searchtechnologies.qpl.solr.QPLSearchComponent"> <str name="scriptFile">search.qpl</str> <str name="defaultField">text</str> <str name="isProcessScript">false</str></searchComponent>

Query Parser Configuration:

Search Component Configuration:

11 The expert in the search space

QPL Example #1

myTerms = solr.tokenize(query);

phraseQ = phrase(myTerms);

andQ = and(myTerms);

return phraseQ^3.0 | andQ^2.0 | orQ;

Tokenize:

Phrase Query:

And Query:

Put It All Together:

orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms);

Or Query:

12 The expert in the search space

Thesaurus Example #2

myTerms = solr.tokenize(query);

thes = Thesaurus.load("thesaurus.xml")

thesQ = thes.expand(0.8f, solr.tokenizer("text"), myTerms);

return and(thesQ);

Tokenize:

Load Thesaurus: (cached)

Thesaurus Expansion:

Put It All Together:Original Query: bathroom humor

[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]

13 The expert in the search space

More Operators

Boolean Query Parser:pQ = parseQuery("(george or martha) near/5 washington")

Relevancy Ranking Operators:q1 = boostPlus(query, optionalQ)q2 = boostMul(0.5, query, optionalQ)q3 = constant(0.5, query)

Composite Queries:compQ = and(compositeMax(

["title":1.5, "body":0.8],"george", "washington"))

14 The expert in the search space

News Feed Use Case

Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older

15 The expert in the search space

News Feed Use Case – Step 1

markets = split(solr.markets, "\\s*;\\s*")marketsQ = field("markets", or(markets));

terms = solr.tokenize(query);termsQ = field("body", or(thesaurus.expand(0.9f, terms)))

compIds = split(solr.compIds, "\\s*;\\s*")compIdsQ = field("companyIds", or(compIds))

Segments:

Terms:

Companies:

16 The expert in the search space

News Feed Use Case – Step 2

todayDate = sdf.format(c.getTime())todayQ = field("date_s",todayDate)

c.add(Calendar.DAY_OF_MONTH, -1)yesterdayDate = sdf.format(c.getTime())yesterdayQ = field("date_s",yesterdayDate)

Today:

Yesterday:

sdf = new SimpleDateFormat("yyyy-MM-dd")cal = Calendar.getInstance()

17 The expert in the search space

News Feed Use Case – Step 3

sq1 = constant(4.0, and(marketsQ, termsQ))sq2 = constant(3.0, marketsQ)sq3 = constant(2.0, termsQ)sq4 = constant(1.0, compIdsQ)subjectQ = max(sq1, sq2, sq3, sq4)

tq1 = constant(10.0, todayQ)tq2 = constant(1.0, yesterdayQ)timeQ = max(tq1, tq2)

recentQ = and(subjectQ, timeQ)

Weighted Subject Queries:

Weighted Time Queries:

Put it All Together:

return max(recentQ, or(marketsQ,compIdsQ)^0.01))

18 The expert in the search space

BT RLP Tokenizer Use Case – Step 1

<tokenizer class="com.basistech.rlp.solr.RLPTokenizerFactory" rlpContext=“<PATH>rlp-context-bl1.xml" postAltLemmas="false"

lang="eng" postPartOfSpeech="false"/>

Define field type:

finalExpandedQuery = transform(queryTerms,[ TERM:{ ctx -> def btCustomTokens = solr.tokenize("subject_bt", ctx.op.term)

if(btCustomTokens.size()> 1) return or( term(btCustomTokens[0])^1.5, or(btCustomTokens[1..-1])); else

return ctx.op;} ]);

QPL Expansion:

19 The expert in the search space

BT RLP Tokenizer Use Case – Step 2

Original User Query: following is "presentation on QPL"

QPL Parsed: and(and(term(following),term(is)), phrase(term(presentation),term(on),term(QPL)))

BT Expansion + QPL Transformation :and(and(or(term(following)^1.5,term(follow)),or(term(is)^1.5,term(be))),phrase(term(presentation),term(on),term(QPL)))

20 The expert in the search space

BT RLP Tokenizer Use Case – Step 3

and

and phrase

Presentation on QPLFollowing is

or

follow

or

be

^1.5 ^1.5

21 The expert in the search space

Embedded Search Example #1

results = solr.search('subjectsCore', or(qTerms), 50)

subjectsQ = or(results*.subjectId)

return field("title", and(qTerms)) | subjectsQ^0.9;

Execute an Embedded Search:

Create a query from the results:

Put it all together:

qTerms = solr.tokenize(qTerms);

22 The expert in the search space

Embedded Search Example #2

results = solr.search('categories', and(qTerms), 10)

myList = solr.newList();myList.add("relatedCategories", results*.title);

solr.addResponse(myList)

Execute an Embedded Search:

Create a Solr named list:

Add it to the XML response:

qTerms = solr.tokenize(qTerms);

23 The expert in the search space

Other Features

Embedded Grouping QueriesOh yes they did!

Proximity operatorsADJ, NEAR/#, BEFORE/#

Reverse LemmatizerPrefers exact matches over variants

TransformerApplies transformations recursively to query trees

24 The expert in the search space

Query Processing Language

Solr

UserInterface

QPLEngine Search

Data as entered by user Boolean

Query ExpressionQPL

Script

ApplicationDev Team

Search Team

25 The expert in the search space

Query Processing Language

Solr

UserInterface

QPLEngine Search

QPLScript

RDBMS OtherIndexes Thesaurus

26 The expert in the search space

More on QPL…

http://www.searchtechnologies.com/query-

parsing-language.html

THANK YOU

Contact: apamulapati@searchtechnologies.comwww.searchtechnologies.com