+ All Categories
Home > Documents > Practical Solr Guide for Developers. First…some questions. How many of you in the room know what...

Practical Solr Guide for Developers. First…some questions. How many of you in the room know what...

Date post: 29-Mar-2015
Category:
Upload: dakota-bunts
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
Practical Solr Guide for Developers
Transcript
Page 1: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Practical SolrGuide for Developers

Page 2: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

First…some questions.

• How many of you in the room know what Solr is?

• How many have worked with Solr?

• How many will be using Solr or text search technology in their upcoming projects?

Page 3: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Why am I here speaking to you about this?

• Several projects in 2011/2012 involving search technology• One of most visited recipe sites un the US with 200,000 hits per

hour during peak times• Resource portal for world’s leading vendor of large format printers

• First encounter was with Lucene.NET which lead to Solr• Second encounter with Solr on Azure• Afterwards Jetty and Tomcat configurations• Currently working on https://github.com/radekz2/SolrStarterKit

99bugs.com

Page 4: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Solr and LuceneSolr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java

Page 5: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Not Frictionless

• Java

• Complex configuration

• Still evolving documentation

• Too many brief tutorials

Page 6: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

What we will talk about today.

• Getting up and running• Setting up as service• Importing data• Spelling• Stopwords, Synonyms, Elevate• Facets• Replication, Zoo Keeper (Cloud setup)• Integration deep dives• Etc.

Page 7: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Solr and Lucene

Page 8: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Web ClientsWeb Server

Solr web application (Solr.war)

Core1 (recipes)data-config.xmlsolrconfig.xmlschema.xml

CMS

Bash/PowerShell etc.

PHPCore2 (food articles)

data-config.xmlsolrconfig.xmlschema.xml

Core3 (etc.)data-config.xmlsolrconfig.xmlschema.xml Document Repositories

Page 9: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Solr Terminology

http://wiki.apache.org/solr/SolrTerminology

Solr Core: Also referred to as just a "Core" This is a running instance of a Solr index along with all of its configuration (SolrConfigXml, SchemaXml, etc...). A single Solr application can contain 0 or more cores which are run largely in isolation but can communicate with each other if necessary via the CoreContainer. From a historical perspective: Solr initially only supported one index, and the SolrCore class was a singleton for coordinating the low-level functionality at the "core" of Solr. When support was added for creating and managing multiple Cores on the fly, the class was refactored to no longer be a Singleton, but the name stuck.

Facet: A distinct feature or aspect of a set of objects; "a way in which a resource can be classified" (*)

Request Handler: A Solr component that processes requests. For example, the DisMaxRequestHandler processes search queries by calling the DisMax Query Parser. Request Handlers can perform other functions, as well.

Page 10: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Solr Terminology

Solr Core: Searchable grouping of documents (index). E.g.

Core 1 = RecipesCore 2 = Articles about Food

Facet: categorisation

Request Handler: Functional grouping under a URL, a lot like a route under PHP frameworks e.g

/core1/search -> searches recipes/core1/importxml-> triggers importing from XML files

Page 11: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Starting Solr under 1 minute

Requirements:• Downloaded and unpackaged Solr • JRE Installed http://java.com

1. Via command line Navigate to /apache-solr-3.6.1/example2. Run java -Dsolr.solr.home=multicore -jar start.jar

* Also see README.txt in /apache-solr-3.6.1/example

Page 12: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Solr With Tomcat

<?xml version="1.0" encoding="utf-8"?><Context docBase="C:/solr_tomcat/apache-solr-3.5.0.war" debug="0" crossContext="true"><Environment name="solr/home" type="java.lang.String" value="C:/solr_tomcat" override="true"/>

</Context>

http://wiki.apache.org/solr/SolrTomcat

C:\Program Files\Apache Software Foundation\Tomcat 6.0\conf\Catalina\localhost

Page 13: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Files and Directories• solr• core0• conf• schema.xml• solrconfig.xml• data-config.xml• dataimport.properties• solrcore.properties

• data• core1• solr.xml

Page 14: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

solr.xml

<solr persistent="false“ sharedLib="global_libs“ > <!-- adminPath: RequestHandler path to manage cores. If 'null' (or absent), cores will not be manageable via request handler --> <cores adminPath="/admin/cores"> <core name="core0" instanceDir="core0" /> <core name="core1" instanceDir="core1" /> </cores>

</solr>

Tip: use sharedLib="global_libs“ attributeOther options: http://wiki.apache.org/solr/CoreAdmin

Solr web application settings, Define your cores here along a few global settings.

Page 15: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

schema.xmlSchema XML is there you describe your data.

• Lucene Field definitions with analysis chain• Column names and their respective Lucene types• Unique key• Default search field• Default operator (AND/OR) – being deprecated in the future

http://lucidworks.lucidimagination.com/display/solr/Solr+Field+Types#SolrFieldTypes-FieldTypesIncludedwithSolr

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

http://wiki.apache.org/solr/LanguageAnalysis

Gotcha: Multivalued fields cannot be sorted

Page 16: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

• dataimport.properties• Status file• Managed by solr• Contains import information such as last import etc.

• solrcore.properties• Contains core specific settings assigned by developer• Settings can be passed to data import definition file

dataimport.properties and solrcore.properties

mycore.languagegroup=en mycore.filenamefilter=.*(en|eew|enw|eez|eep)\.(xml)

In data config, these options can be retrieved as:${mycore.languagegroup} ${mycore.filenamefilter}Etc.

Page 17: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Importing

Gotcha: The XPathEntityProcessor implements a streaming parser which supports a subset of xpath syntax. Complete xpath syntax is not supported but most of the common use cases are covered as follows:-

xpath="/a/b/subject[@qualifier='fullTitle']" xpath="/a/b/subject/@qualifier" xpath="/a/b/c"

Gotcha: SQL Timeouts

• From XML• XML can originate in a single file, multiple files (same schema) or HTTP• Solr with loop over common data nodes using it’s for-each mechanism

• From Database• You will need a JDBC driver for your database• Can run multiple queries with reference variables passed from one entity to

another

Page 18: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

JDBC Timeouts<dataSource name="jdbc" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" type="JdbcDataSource" url="jdbc:sqlserver://dvtaoomb.database.windows.net:1433;database=DB_Infrastructure;user=ConsumerSitesDev@dvtaoomb;password=@SecurePwd;encrypt=true;hostNameInCertificate=data.ch1-1.database.windows.net" responseBuffering="full" >

<property name="testOnBorrow" value="true"/> <property name="validationQuery" value="SELECT 1"/> </dataSource>

http://commons.apache.org/dbcp/configuration.html

Page 19: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Stop Words

aanandareasatbebutby

forifinintoisitnonotof

onorssuchtthatthetheirthen

therethesetheythistowaswillwith

• Stop words list in • /apache-solr-3.6.1/example/example-DIH/solr/solr/conf

• You can find more stopwords using schema browser

Page 20: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Spellcheck• Solr will build a spell index from existing index• Spell index will be a separate set of index files and it’s building

needs to be triggered• Spell index generation is called only once, do not call with

every query

http://wiki.apache.org/solr/SpellCheckComponent

http://localhost:8080/solr/Core_ImportXml/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&spellcheck.build=true&spellcheck.q=stering&spellcheck=true

Note: the spellcheck.build=true which is needed only once to build the spellcheck index from the main Solr index. It takes time and should not be specified with each request.

Note: Combine multiple fields into single spell field using <copyField source="ProductDescription" dest="ProductSpellText"/>

Gotcha: solr.PorterStemFilterFactory

Page 22: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Transformers

• RegexTransformer• ScriptTransformer• DateFormatTransformer• NumberFormatTransformer• TemplateTransformer• HTMLStripTransformer• ClobTransformer• LogTransformer

http://wiki.apache.org/solr/DataImportHandler#Transformer

Page 23: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

beefstew = Beef stew

bring certain documents to the top based on query

Synonyms

Query Elevate

Page 24: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Documentation• http://wiki.apache.org/solr/DataImportHandler• http://wiki.apache.org/solr/SolrTerminology• http://wiki.apache.org/solr/SpellCheckComponent• http://

lucidworks.lucidimagination.com/display/solr/Solr+Field+Types#SolrFieldTypes-FieldTypesIncludedwithSolr

• http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters• http://wiki.apache.org/solr/LanguageAnalysis• http://commons.apache.org/dbcp/configuration.html• http://wiki.apache.org/solr/SimpleFacetParameters• http://wiki.apache.org/solr/SolrRequestHandler• http://wiki.apache.org/solr/IntegratingSolr

Page 25: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Gotchas

• Form content type• http://

stackoverflow.com/questions/2997014/can-you-use-post-to-run-a-query-in-solr-select

• application/xml (not application/x-www-form-urlencoded)• Mutlivalue fields cannot be sorted• Dates (use date transformers)• JDBC Timeouts• Slow indexing with multiple database entities• XPath Limitations• Can you recreate your updates?• Are you storing enough data?

Page 26: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.
Page 27: Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Thank You!

Radek Zajkowski

https://joind.in/7458

www.99bugs.com


Recommended