Date post: | 12-Apr-2017 |
Category: |
Technology |
Upload: | martina-helene-welander |
View: | 573 times |
Download: | 1 times |
Google Is Just a Two Page SiteRelevant Results with Sitecore.ContentSearch
Martina Helene WelanderTechnical Consulting Engineer, Sitecore
Speaker
• Technical Consulting Engineer at Sitecore• Community and Information Enthusiast• Ecosystem Sites with Dnepropetrovsk Team
Martina Helene Welander
Hi!• Martina Welander• Technical Consulting Engineer• Ecosystem sites• mhwelander.net / @mhwelander
Speaker
• Technical Consulting Engineer at Sitecore• Community and Information Enthusiast• Ecosystem Sites with Dnepropetrovsk Team• @mhwelander / mhwelander.net
Martina Helene Welander
Speaker
In the direction of awesome, that’s where
…let’s do search!
Can haz knowledge?
Google Is Just a Two Page SiteRelevant Results with Sitecore.ContentSearch
Martina Helene WelanderTechnical Consulting Engineer, Sitecore
“Google is simply a search box with a second page of results. And those results are from other sites!”
Lalala hello world
examples lalala ten
items in my tree!
Sitecore.ContentSearch 101
Sitecore 7
Search and index
ALL the items
*
*
Search API(LINQ-based)
Search Technology Provider(DLLs and Configuration)
Search Technology API and Indexes
IEnumerable<DocSearchResult>
var index = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index");
using (var context = index.CreateSearchContext()){
var query = context.GetQueryable<ResultItem>().Where(x => x.Title == "Hej"); var executedResults = query.GetResults(); myModel.myList = executedResults.Hits.Select(x => x.Document).ToList();
}
Where Sitecore adds value• Source content to index to strongly typed object – and back again!• You can actually index anything• Provider model – Solr, Lucene, Elastic Search, Azure Search• Provider-agnostic LINQ-based search API• Highly configurable
Sitecore.ContentSearch is an API
Where should I focus my efforts?
CONFIIIIIG!
Crawlers
Mappers
Converters
Sitecore Field Index Field Object Property
Analyzers
Sitecore Field Searchable Data
Analyzer Wrappers
Back to Plain Ol’ SearchActually kind of difficult
It’s all about the Pentiums analyzers(Tokenizers and Filters)
Tokenizers
Hello my name is Martina
“Hello”, “my”, “name”, “is”, “Martina”
Types of TokenizerStandardTokenizer
“My name is Martina” “My”, “name”, “is”, “Martina”
KeywordTokenizer“My name is Martina” “My name is Martina”
N-Gram Tokenizer (Min 4, Max 5)“sitecore” -> “site”, “itec”, “ecor”, “core”, “siteco”, “iteco” … etc
Filters
Examples of Filters• Standard Filter• (Snowball) Porter Stem Filter• Stop Filter• Synonym Filter• Keep Words Filter• Pattern Replace Filter
ORDER MATTERS!
Indexing Process
Index
Query
Results
“name””Hello”
“Hello, my name is Martina”
“Martina”“my”
Rebuild when analyser changes!
Contains(“Hello, my name is Martina”)
Configuring a custom analyzer
Lucene – What does it look like?
Solr – What does it look like?
<fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory"/> </analyzer></fieldType>
Previewing and help
6492 12:54:21 INFO ExecuteQueryAgainstLucene (sitecore_master_index): content:make~0.7 title:make~0.7 content:new~0.7 title:new~0.7 content:item~0.7 title:item~0.7 - Filter :
Debugging A Lucene-Based ContentSearch In Sitecore- Dan Cruickshank
My Super-Duper Analyzer
…which isn’t very special at all • Standard analyser• Standard filter• Porter Stem Filter• StopWords Filter• Synonym Filter (EXM / ECM, PXM / APS)*• Lowercase filter
The Query
What makes something relevant? (tf.idf)• tf – term frequency • Idf – inverse document frequency • coord - # of terms found in document • fieldNorm – field length
My fields• Title• Text• Byline• Keywords• Product
context.GetQueryable<ResultItem>() .Where(…)
.Filter() vs .Where()
#1 – Find me a match• Equals()• Contains()• StartsWith()
.Where(x => x.ResultsTitle.Contains("scaling"))
.Where(x => x["scaling"].Contains("scaling"))
.Match()
.EndsWith()
#2 – Slop and fuzziness!• Like()• Fuzzy search – fuzziness factor (float)• Phrase search – slop (int)
#3 – I love you, PredicateBuilder Expression<Func<ResultItem, bool>> predicate = PredicateBuilder.True<ResultItem>();
foreach (var word in list) {
predicate = predicate.Or(x => x.Title.Contains(word);}
False for ‘OR’,True for ‘AND’
#4 – Boost
• At query time• At index time (type or field)• Rules-based
BOOST
BOOST
~1000 real items
storageType=“true”
Attempt #1: EVERYTHING
If the title…
• Like phrase (with slop)• Contains phrase• Starts with phrase• Equals phrase
If the content…
• Like phrase (with slop)• Contains phrase• Starts with phrase• Equals phrase
Search: xDB Scaling
Search: Managing engagement plans
Search: Create engagement plans
A couple of important lessons•Whole Phrases vs Individual Terms• Boost()• Contains() / Equals()
Attempt #2: Phrase and terms
“engagement plan setup”OR
“engagement” OR “plan” OR “setup”
“engagement” AND “plan”OR
“engagement” AND “setup”OR
“plan” AND “setup”OR
“engagement” AND “plan” AND “setup”
Needs more boost
Attempt #3: Favouring titles
Sitecore 7 ContentSearch Tips- Matt Burke
“Finding a user’s search term in the title or keywords of a document is probably more relevant than one where the term is only in the body”
My work in progress
If nothing is working, you probably didn’t rebuild your index
Search: xDB Scaling
Search: Manage engagement plans
Search: Create engagement plans
// TODO: On the plane home• Keywords• Location • Pinning exact title matches – “scaling”• Expected search phrases with boost – e.g. “scaling xDB”, “xDB
scaling”, “xDB scaling options”
xDB• Key Behaviour Cache – developer or editor?• Common searches
It’s not all queries and indexes• Vague titles are a bit of a nightmare• Review use of keywords in content• “I would never search for that!” • Continuous user testing and tuning
What I learned• It isn’t magic • Get to know the provider• Content and content structure matter• Search is actually quite hard
OrganizersSponsor
Thanks to our… &…