Apache Solr! Enterprise Search Solutions at your Fingertips!

Post on 07-May-2015

4,578 views 7 download

description

Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.

transcript

Apache-Solr! Enterprise Search Solutions at your Fingertips!

Murshed Ahmmad Khan @usamurai, murshed2k@gmail.com

Presented at phpXperts seminar 2011…

The criteria…

Enterprise Search Server

Fast

Flexible

Powerful

Scalable

Relevant Results

Production ready & Easy deployment

What’s in your mind, the name…

??

Apache Solr!

Fits all the above mentioned criteria…

Solr, What is it…?

q Open Source, Java application q Runs as a standalone full-text search server within any servlet container

q Uses Lucene Java search library as its core

SOLR WORK FLOW…

Solr History… q Developed at CNET Networks by Yonik Seeley

q Donated to ASF (Apache Software Foundation) in early 2006

Solr History…(2)

q Incubation period ended in january 2007 (v-1.2 released)

q Solr is now maintained as a subproject of Lucene

Solr - Features…

Powerful Full-Text search…

Hit Highlighting…

Faceted Search…

Tag Clouds…

Geo-spatial search…

Solr – Features (cont..) q  Database integration

q  Rich document (Word, PDF) handling

q  REST-like HTTP/XML, JSON APIs (so, you can code virtually in any language)

CLIENT API SUPPORT… q  Java (SolrJ), q  .NET (solrnet, SolrSharp), q  PHP (SolPHP), q  Python (SolPython), q  Ruby(on Rails) (rsolr, acts-as-solr,

sunspot), q  C++, q  XML/HTTP, q  JSON/HTTP (AJAX Solr) ++ q  PERL(SolPerl)

Solr - Features… (cont…) q  Flexible configuration

q  Extensive Plugin architecture for advanced customization

q  Scalable distributed search, dynamic clustering, index replication

Alternatives to Solr q Use Google (GSA – has

integration problems).

q  FAST (Stopped supporting linux)

q  Use Lucene (write code on top of that)

Alternatives to Solr…(2) q  Use your Database (has

performance issues)

q  Sphinx (written in C++)

q  Commercial Libraries (e.g. lucidimagination.com)

q  Write your own

Who Use Solr/Lucene?

Who use Solr/Lucene…

More names: http://wiki.apache.org/solr/PublicServers

OPERATING SYSTEM SUPPORT

q All with a Java VM, including:

q Linux (all versions)

q Windows (all versions)

q MacOS (all versions)

q Unix variants

APP SERVER SUPPORT q Apache Tomcat, q Jetty, q Resin, q WebLogicTM, q WebSphereTM, q GlassFish, q dmServerTM, q JBossTM and many more q Java JDK 1.5 or later [requirement]

INSTALLATION

1.  Download the latest version of: apache-solr & tomcat

2. Extract it: $tar -xzvf ./apache-solr-1.4.1.tgz $tar -xzvf ./apache-tomcat-6.0.35.tar.gz

INSTALLATION

3. copy the solr.war file in the tomcat webapps folder: $ cp apache-solr-1.4.1/dist/apache-solr-1.4.1.war  apache-tomcat-6.0.35/webapps/solr.war

4. copy the example/solr directory into the tomcat home directory $ cp -r apache-solr-1.4.1/example/solr .

INSTALLATION

5. start the tomcat server $ ./bin/startup.sh

6. Visit http://localhost:8080/solr/admin/

YOU ARE DONE…

CREATE SCHEMA.XML <field name="id" type="string" indexed="true" stored="true" required="true" />

<field name="service" type="string" indexed="true" stored="true" required="true" />

<field name="contentType" type="string" indexed="true" stored="true" required="true" />

<field name="dbId" type="long" indexed="true" stored="true" />

<field name="content" type="text" indexed="true" stored="true" />

<copyField source="*" dest=”all” />

INDEX DOCUMENTS (INDEXER)

The Common Loop

INDEX DOCUMENTS

1.   </add> Add single/multiple documents $doc = new SolrSimpleDocument( array(

new SolrSimpleField('id', ’aawaj-profile-' . $user->id),

new SolrSimpleField('service', 'aawaj'),

new SolrSimpleField('contentType', 'profile'),

new SolrSimpleField('dbId', (string)$user->id)

)); $this->solr->add($doc);

INDEX DOCUMENTS

2. </commit>

Commit multiple documents at once.

$this->solr->commit();

INDEX DOCUMENTS

3. </optimize>

Optimize, for performance improvement

$this->solr->optimize();

SOLR QUERY SYNTAXES

QUERY SYNTAXES (RDMS)

SELECT * FROM post WHERE (topic LIKE ‘%apache%’ OR author LIKE ‘%kabir%’)

OR (topic LIKE ‘%solr%’ OR author LIKE ‘%frank%’) ORDER BY id DESC

QUERY SYNTAXES (SOLR)

Topic:"The Right Way" AND author:WrongGuy

BOOSTING TERMS()

topic: "jakarta apache"^4 "Apache Lucene"

FUZZY SEARCH (SOLR)

topic:roam~ (similar in spelling roam)

matches foam roams, based on the Levenshtein Distance, or Edit Distance algorithm

PROXIMITY SEARCH (SOLR)

“jakarta apache”~10

search for a "apache" and "jakarta" within 10 words of each other in a document

SO, NOW, CAN I MAKE A MINI

GOOGLE?

YES, YOU CAN!

q Apache NUTCH, already there

q Open source, Web-search software project.

q Based on Solr...

INTERESTED? READ MORE… Ø  http://lucene.apache.org/solr/ Ø  http://wiki.apache.org/solr Ø  http://lucene.apache.org/java/docs/

scoring.html

Ø  http://lucene.apache.org/java/3_5_0/queryparsersyntax.html

Ø  http://www.slideshare.net/erikhatcher/solr-search-at-the-speed-of-light http://www.slideshare.net/pittaya/using-apache-solr

WHO AM I… murshed ahmmad Khan head of development,

http://www.usamurai.com @usamurai email: murshed2k@gmail.com

THANKS…

Questions?