+ All Categories
Transcript
Page 1: Intro to Solr in Drupal

Intro to Solr

Page 2: Intro to Solr in Drupal

DrupalConPortland

Page 3: Intro to Solr in Drupal
Page 4: Intro to Solr in Drupal

Andrew RileyDirector of Drupal Development

@andrewmriley

Page 5: Intro to Solr in Drupal

Agenda

Search?WhySolr? Searching

Behindthe

Scenes

Page 6: Intro to Solr in Drupal

Search?

Page 7: Intro to Solr in Drupal

What is Search?

Search (v): to go or look through (a place, area, etc.) carefully in order to find something missing or lost: I searched the desk for the letter.

Source: http://dictionary.reference.com/browse/search

@Mediacurrent

Page 8: Intro to Solr in Drupal

Why Users Search

•Navigation doesn't make sense

• It can be faster

•Lots of data

•Frequent data changes

•Might just be looking for something

@Mediacurrent

Page 9: Intro to Solr in Drupal

Search Problems

•Search accuracy

•Too much data

•Slow response

•Wrong results

@Mediacurrent

Page 10: Intro to Solr in Drupal

Why

Solr?

Page 11: Intro to Solr in Drupal

History

Solr was initially created in 2004 as an in-house project for CNET. It was open sourced in 2006 and donated to the Apache Software Foundation.

@Mediacurrent

Page 12: Intro to Solr in Drupal

Lucene

•Solr is a layer on top of Lucene

•Lucene is a library

•Solr stores files in Lucene format

*http://wiki.apache.org/solr/SolrPerformanceData

@Mediacurrent

Page 13: Intro to Solr in Drupal

Speed

Search speed is important!

@Mediacurrent

Page 14: Intro to Solr in Drupal

Speed

Source: Web Performance Today http://j.mp/12h8wLZ

@Mediacurrent

Page 15: Intro to Solr in Drupal

Speed

• Important!

• It scales well

•No database required

•Clustering & Sharding

•Netflix runs 1.2MM q/day on 4 servers*

*http://wiki.apache.org/solr/SolrPerformanceData

@Mediacurrent

Page 16: Intro to Solr in Drupal

Natural Results

•Stemming: Blogging vs. Blog

•Stop Word Removal: The

•Synonyms: Tissue vs Kleenex

•Highly Configurable

@Mediacurrent

Page 17: Intro to Solr in Drupal

Drupal Search

•Not stemmed by default

•Queries the database

•Stores tokenized words in a single large table

•Much slower to index

@Mediacurrent

Page 18: Intro to Solr in Drupal

VS@Mediacurr

ent

Page 19: Intro to Solr in Drupal

Searching

Page 20: Intro to Solr in Drupal

Ordering

•Score

•Comes from Lucene

•Not "out of 100"

•Bigger score first

More Info: http://lucene.apache.org/core/3_6_1/scoring.html

???

201

200

199

184

@Mediacurrent

Page 21: Intro to Solr in Drupal

Facets

•Users do the work

•Fixes too much data

•Native to Solr

•Requires the Facet API module

•Shopping Sites

@Mediacurrent

Page 22: Intro to Solr in Drupal

Behind the

Scenes

Page 23: Intro to Solr in Drupal

Index?

• Index contains Documents

•Documents have Fields

•Fields have Terms

•~2 minutes for updates

•Uses Lucene syntax

@Mediacurrent

Page 24: Intro to Solr in Drupal

Tokenizing

•Splits words and numbers"this" "is" "blogging"

•Excludes Stopwords"this" "blogging"

•Handles Stemming (if enabled)"this" "blog"

•Very configurable

@Mediacurrent

Page 25: Intro to Solr in Drupal

Bias

•Adjusts the order of search results

•Works on: Content Type, Fields, Comments, Promoted to Home Page and more

•Can be dynamic with custom modules.

@Mediacurrent

Page 26: Intro to Solr in Drupal

Recap

Page 27: Intro to Solr in Drupal

Modules

•Apache Solr (apachesolr)

•Facet API (facetapi)

•Chaos tool suite (ctools)

@Mediacurrent

Page 28: Intro to Solr in Drupal

Overall

•Search is becoming more and more important

•You want to control your search results

• If you don't provide a good search experience, somebody else will.

•Solr doesn't have to be complex.

•Solr is fast and scales.

@Mediacurrent

Page 29: Intro to Solr in Drupal

Thank You!

Questions?

@Mediacurrent Mediacurrent.com

[email protected]

@andrewmriley

slideshare.net/mediacurrent


Top Related