Date post: | 16-May-2015 |
Category: |
Technology |
Upload: | search-technologies |
View: | 361 times |
Download: | 1 times |
Advanced Relevancy Ranking
Paul NelsonChief Architect / Search Technologies
2Search Technologies Overview
• Formed June 2005• Over 100 employees and growing• Over 400 customers worldwide• Presence in US, Latin America, UK & Germany• Deep enterprise search expertise• Consistent revenue growth and profitability• Search Engine Independent
3Lucene Relevancy: Simple Operators
• term(A) TF(A) * IDF(A)• Implemented with DefaultSimilarity / TermQuery• TF(A) = sqrt(termInDocCount)• IDF(A) = log(totalDocsInCollection/(docsWithTermCount+1)) + 1.0
• and(A,B) A * B• Implemented with BooleanQuery()
• or(A, B) A + B• Implemented with BooleanQuery()
• max(A, B) max(A, B)• Implemented with DisjunctionMaxQuery()
3
4Simple Operators - Example
and
or max
george martha washington custis
0.10 0.20 0.60 0.90
0.1 + 0.2 = 0.30 max(0, 0.9) = 0.90
0.3 * 0.9 = 0.27
5Less Used Operators
• boost(f, A) (A * f)• Implemented with Query.setBoost(f)
• constant(f, A) if(A) then f else 0.0• Implemented with ConstantScoreQuery()
• boostPlus(A, B) if(A) then (A + B) else 0.0• Implemented with BooleanQuery()
• boostMul(f, A, B) if(B) then (A * f) else A• Implemented with BoostingQuery()
5
6Problem: Need for More Flexibility
• Difficult / impossible to use all operators• Many not available in standard query parsers
• Complex expressions = string manipulation• This is messy
• Query construction is in the application layer• Your UI programmer is creating query expressions?• Seriously?
• Hard to create and use new operators• Requires modifying query parsers - yuck
6
7
Solr
Query Processing Language 7
UserInterface
QPLEngine Search
QPLScript
8Introducing: QPL
• Query Processing Language• Domain Specific Language for Constructing Queries• Built on Groovy• https://wiki.searchtechnologies.com/index.php/QPL_Home_Page
• Solr Plug-Ins• Query Parser• Search Component
• “The 4GL for Text Search Query Expressions”• Server-side Solr Access
• Cores, Analyzers, Embedded Search, Results XML
8
9Solr Plug-Ins
10QPL Configuration – solrconfig.xml
<queryParser name="qpl"class="com.searchtechnologies.qpl.solr.QPLSolrQParserPlugin">
<str name="scriptFile">parser.qpl</str><str name="defaultField">text</str>
</queryParser>
<searchComponent name="qplSearchFirst"class="com.searchtechnologies.qpl.solr.QPLSearchComponent">
<str name="scriptFile">search.qpl</str><str name="defaultField">text</str><str name="isProcessScript">false</str>
</searchComponent>
Query Parser Configuration:
Search Component Configuration:
11QPL Example #1
myTerms = solr.tokenize(query);
phraseQ = phrase(myTerms);
andQ = and(myTerms);
return phraseQ^3.0 | andQ^2.0 | orQ;
Tokenize:
Phrase Query:
And Query:
Put It All Together:
orQ = (myTerms.size() <= 2) ? null : orMin( (myTerms.size()+1)/2, myTerms);
Or Query:
12Thesaurus Example #2
myTerms = solr.tokenize(query);
thes = Thesaurus.load("thesaurus.xml")
thesQ = thes.expand(0.8f,solr.tokenizer("text"), myTerms);
return and(thesQ);
Tokenize:
Load Thesaurus: (cached)
Thesaurus Expansion:
Put It All Together:Original Query: bathroom humor
[or(bathroom, loo^0.8, wc^0.8), or(humor, jokes^0.8)]
13More Operators
Boolean Query Parser:pQ = parseQuery("(george or martha) near/5 washington")
Relevancy Ranking Operators:q1 = boostPlus(query, optionalQ)q2 = boostMul(0.5, query, optionalQ)q3 = constant(0.5, query)
Composite Queries:compQ = and(compositeMax(
["title":1.5, "body":0.8],"george", "washington"))
14News Feed Use Case 14
Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older
15News Feed Use Case – Step 1
markets = split(solr.markets, "\\s*;\\s*")marketsQ = field("markets", or(markets));
terms = solr.tokenize(query);termsQ = field("body",
or(thesaurus.expand(0.9f, terms)))
compIds = split(solr.compIds, "\\s*;\\s*")compIdsQ = field("companyIds", or(compIds))
Segments:
Terms:
Companies:
16News Feed Use Case – Step 2
todayDate = sdf.format(c.getTime())todayQ = field("date_s",todayDate)
c.add(Calendar.DAY_OF_MONTH, -1)yesterdayDate = sdf.format(c.getTime())yesterdayQ = field("date_s",yesterdayDate)
Today:
Yesterday:
sdf = new SimpleDateFormat("yyyy-MM-dd")cal = Calendar.getInstance()
17News Feed Use Case 17
Order Documents Date1 markets+terms Today2 markets Today3 terms Today4 companies Today5 markets+terms Yesterday6 markets Yesterday7 terms Yesterday8 companies Yesterday9 markets, companies older
18News Feed Use Case – Step 3
sq1 = constant(4.0, and(marketsQ, termsQ))sq2 = constant(3.0, marketsQ)sq3 = constant(2.0, termsQ)sq4 = constant(1.0, compIdsQ)subjectQ = max(sq1, sq2, sq3, sq4)
tq1 = constant(10.0, todayQ)tq2 = constant(1.0, yesterdayQ)timeQ = max(tq1, tq2)
recentQ = and(subjectQ, timeQ)
Weighted Subject Queries:
Weighted Time Queries:
Put it All Together:
return max(recentQ, or(marketsQ,compIdsQ)^0.01))
19Embedded Search Example #1
results = solr.search('subjectsCore', or(qTerms), 50)
subjectsQ = or(results*.subjectId)
return field("title", and(qTerms)) | subjectsQ^0.9;
Execute an Embedded Search:
Create a query from the results:
Put it all together:
qTerms = solr.tokenize(qTerms);
20Embedded Search Example #2
results = solr.search('categories', and(qTerms), 10)
myList = solr.newList();myList.add("relatedCategories", results*.title);
solr.addResponse(myList)
Execute an Embedded Search:
Create a Solr named list:
Add it to the XML response:
qTerms = solr.tokenize(qTerms);
21Other Features
• Embedded Grouping Queries• Oh yes they did!
• Proximity operators• ADJ, NEAR/#, BEFORE/#
• Reverse Lemmatizer• Prefers exact matches over variants
• Transformer• Applies transformations recursively to query trees
21
22
Solr
Query Processing Language 22
UserInterface
QPLEngine Search
Data as entered by user Boolean
Query ExpressionQPL
Script
ApplicationDev Team
Search Team
23
Solr
QPL: Using External Sources to Build Queries 23
UserInterface
QPLEngine Search
QPLScript
RDBMS OtherIndexes Thesaurus