Date post: | 27-May-2015 |
Category: |
Documents |
Upload: | nick-moline |
View: | 455 times |
Download: | 3 times |
Nephelococcygia(noun) The act of searching for
shapes in clouds
Nick MolineJustia’s Cloud Farmer@NickMolinehttp://www.nick.pro/
Guillermo BalboaSoftware Developer
Search Engines we’ve tried…
• Google Mini / Search Appliance• Google Custom Search Engine• Sphinx• Apache SOLR• Amazon Cloud Search
Google Mini
Pros Cons
Really simple to set up (for web pages or documents)
No discrete field searching (other than things like title:)
Can run inside your firewall Physical branded box to install and support
Great highlighting and snippet generation Very limited control over look / feel of results
Creates “Cached Version” even of PDFs
No Geospatial
No JSON version
Discontinued
Pugs love them
Pros Cons
Really simple to set up (for web pages or documents)
No discrete field searching (other than things like title:)
If your site is indexed, no wait time to get started
Minimal control over new content getting indexed on your terms
Great highlighting and snippet generation Very limited control over look / feel of resultsWith JSON/XML version can only return 4 or 8 results at a timeNo Geospatial
Pros Cons
Very fast for searching Very slow for indexing
Full control of when content is indexed Requires reindexing ALL content, every time
Good Geospatial Search built in
Newer versions can be connected to with MySQL libraries and queried like a DB
Doesn’t return any of the textual content, so requires a separate database query ALWAYS
Filters & Faceting Only on Numeric fields
Field Boosting!
Pros Cons
Very extendable and configurable Very difficult to optimize performance
Full control of when content is indexed Adding Content
Geospatial with “LocalSOLR” plugin
Returns content More content you return, slower it gets
Does highlighting Highlighting is not good performance
Tons of Faceting options
Sharding and Cores Again, hard to optimize
Field Boosting
Document Boosting!
Pros Cons
Extremely fast
Automatically Scales (no thinking) No control of the scaling
Automatically Shards when adding contentEasy Re-indexing of content
Returns content for creating snippets
Easy JSON implementation
No geo (yet)
No highlighting/snippet gen (yet)
No field boosting (yet)
Getting Around Lack of Field Boosting
• Duplicate Word Mark field as both text and literal• Do 4 Searches:
• Exact Word Mark Match• bq=(and type:'trademark_case' literal_word_mark:'amazon')
• Prefix Word Mark Match• bq=(and type:'trademark_case' (and (not literal_word_mark:'amazon')
literal_word_mark:'amazon*'))• Anywhere Word Mark Match
• bq=(and type:'trademark_case' (and (not literal_word_mark:'amazon*') word_mark:'amazon'))
• Full Text Search• bq=(and type:'trademark_case' (and (not literal_word_mark:'amazon*')
(not literal_word_mark:'amazon') (not word_mark:'amazon')))• Pass counts with pagination links
We want more!• Trademarks– Field Boosting would simplify Greatly!– Snippet gen could make for nicer search
snippets• Law.justia.com– Using Google Custom Search now–Must have Snippet Generation–Must have Field Boosting
• Lawyers.justia.com– Using sphinx right now–Must have Geospatial–Must have Field Boosting
Nick MolineJustia’s Cloud Farmer@NickMolinehttp://www.nick.pro/
Guillermo BalboaSoftware Developer