Intro to Solr in Drupal

Post on 10-May-2015

484 views 0 download

description

Does your website have a ton of data? How do your users find the relevant pages among all the noise in your site? Solr can help deliver the pertinent search results to your users regardless of your site's size. Apache Solr is a Java program that integrates with the Drupal contrib module that allows your users to quickly search millions of records and narrow down the results with minimal system impact.

transcript

Intro to Solr

DrupalConPortland

Andrew RileyDirector of Drupal Development

@andrewmriley

Agenda

Search?WhySolr? Searching

Behindthe

Scenes

Search?

What is Search?

Search (v): to go or look through (a place, area, etc.) carefully in order to find something missing or lost: I searched the desk for the letter.

Source: http://dictionary.reference.com/browse/search

@Mediacurrent

Why Users Search

•Navigation doesn't make sense

• It can be faster

•Lots of data

•Frequent data changes

•Might just be looking for something

@Mediacurrent

Search Problems

•Search accuracy

•Too much data

•Slow response

•Wrong results

@Mediacurrent

Why

Solr?

History

Solr was initially created in 2004 as an in-house project for CNET. It was open sourced in 2006 and donated to the Apache Software Foundation.

@Mediacurrent

Lucene

•Solr is a layer on top of Lucene

•Lucene is a library

•Solr stores files in Lucene format

*http://wiki.apache.org/solr/SolrPerformanceData

@Mediacurrent

Speed

Search speed is important!

@Mediacurrent

Speed

Source: Web Performance Today http://j.mp/12h8wLZ

@Mediacurrent

Speed

• Important!

• It scales well

•No database required

•Clustering & Sharding

•Netflix runs 1.2MM q/day on 4 servers*

*http://wiki.apache.org/solr/SolrPerformanceData

@Mediacurrent

Natural Results

•Stemming: Blogging vs. Blog

•Stop Word Removal: The

•Synonyms: Tissue vs Kleenex

•Highly Configurable

@Mediacurrent

Drupal Search

•Not stemmed by default

•Queries the database

•Stores tokenized words in a single large table

•Much slower to index

@Mediacurrent

VS@Mediacurr

ent

Searching

Ordering

•Score

•Comes from Lucene

•Not "out of 100"

•Bigger score first

More Info: http://lucene.apache.org/core/3_6_1/scoring.html

???

201

200

199

184

@Mediacurrent

Facets

•Users do the work

•Fixes too much data

•Native to Solr

•Requires the Facet API module

•Shopping Sites

@Mediacurrent

Behind the

Scenes

Index?

• Index contains Documents

•Documents have Fields

•Fields have Terms

•~2 minutes for updates

•Uses Lucene syntax

@Mediacurrent

Tokenizing

•Splits words and numbers"this" "is" "blogging"

•Excludes Stopwords"this" "blogging"

•Handles Stemming (if enabled)"this" "blog"

•Very configurable

@Mediacurrent

Bias

•Adjusts the order of search results

•Works on: Content Type, Fields, Comments, Promoted to Home Page and more

•Can be dynamic with custom modules.

@Mediacurrent

Recap

Modules

•Apache Solr (apachesolr)

•Facet API (facetapi)

•Chaos tool suite (ctools)

@Mediacurrent

Overall

•Search is becoming more and more important

•You want to control your search results

• If you don't provide a good search experience, somebody else will.

•Solr doesn't have to be complex.

•Solr is fast and scales.

@Mediacurrent

Thank You!

Questions?

@Mediacurrent Mediacurrent.com

andrew.riley@mediacurrent.com

@andrewmriley

slideshare.net/mediacurrent