Date post: | 07-May-2015 |
Category: |
Technology |
Upload: | murshed-ahmmad-khan |
View: | 4,578 times |
Download: | 7 times |
Apache-Solr! Enterprise Search Solutions at your Fingertips!
Murshed Ahmmad Khan @usamurai, [email protected]
Presented at phpXperts seminar 2011…
The criteria…
Enterprise Search Server
Fast
Flexible
Powerful
Scalable
Relevant Results
Production ready & Easy deployment
What’s in your mind, the name…
??
Apache Solr!
Fits all the above mentioned criteria…
Solr, What is it…?
q Open Source, Java application q Runs as a standalone full-text search server within any servlet container
q Uses Lucene Java search library as its core
SOLR WORK FLOW…
Solr History… q Developed at CNET Networks by Yonik Seeley
q Donated to ASF (Apache Software Foundation) in early 2006
Solr History…(2)
q Incubation period ended in january 2007 (v-1.2 released)
q Solr is now maintained as a subproject of Lucene
Solr - Features…
Powerful Full-Text search…
Hit Highlighting…
Faceted Search…
Tag Clouds…
Geo-spatial search…
Solr – Features (cont..) q Database integration
q Rich document (Word, PDF) handling
q REST-like HTTP/XML, JSON APIs (so, you can code virtually in any language)
CLIENT API SUPPORT… q Java (SolrJ), q .NET (solrnet, SolrSharp), q PHP (SolPHP), q Python (SolPython), q Ruby(on Rails) (rsolr, acts-as-solr,
sunspot), q C++, q XML/HTTP, q JSON/HTTP (AJAX Solr) ++ q PERL(SolPerl)
Solr - Features… (cont…) q Flexible configuration
q Extensive Plugin architecture for advanced customization
q Scalable distributed search, dynamic clustering, index replication
Alternatives to Solr q Use Google (GSA – has
integration problems).
q FAST (Stopped supporting linux)
q Use Lucene (write code on top of that)
Alternatives to Solr…(2) q Use your Database (has
performance issues)
q Sphinx (written in C++)
q Commercial Libraries (e.g. lucidimagination.com)
q Write your own
Who Use Solr/Lucene?
Who use Solr/Lucene…
More names: http://wiki.apache.org/solr/PublicServers
OPERATING SYSTEM SUPPORT
q All with a Java VM, including:
q Linux (all versions)
q Windows (all versions)
q MacOS (all versions)
q Unix variants
APP SERVER SUPPORT q Apache Tomcat, q Jetty, q Resin, q WebLogicTM, q WebSphereTM, q GlassFish, q dmServerTM, q JBossTM and many more q Java JDK 1.5 or later [requirement]
INSTALLATION
1. Download the latest version of: apache-solr & tomcat
2. Extract it: $tar -xzvf ./apache-solr-1.4.1.tgz $tar -xzvf ./apache-tomcat-6.0.35.tar.gz
INSTALLATION
3. copy the solr.war file in the tomcat webapps folder: $ cp apache-solr-1.4.1/dist/apache-solr-1.4.1.war apache-tomcat-6.0.35/webapps/solr.war
4. copy the example/solr directory into the tomcat home directory $ cp -r apache-solr-1.4.1/example/solr .
INSTALLATION
5. start the tomcat server $ ./bin/startup.sh
6. Visit http://localhost:8080/solr/admin/
YOU ARE DONE…
CREATE SCHEMA.XML <field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="service" type="string" indexed="true" stored="true" required="true" />
<field name="contentType" type="string" indexed="true" stored="true" required="true" />
<field name="dbId" type="long" indexed="true" stored="true" />
<field name="content" type="text" indexed="true" stored="true" />
<copyField source="*" dest=”all” />
INDEX DOCUMENTS (INDEXER)
The Common Loop
INDEX DOCUMENTS
1. </add> Add single/multiple documents $doc = new SolrSimpleDocument( array(
new SolrSimpleField('id', ’aawaj-profile-' . $user->id),
new SolrSimpleField('service', 'aawaj'),
new SolrSimpleField('contentType', 'profile'),
new SolrSimpleField('dbId', (string)$user->id)
)); $this->solr->add($doc);
INDEX DOCUMENTS
2. </commit>
Commit multiple documents at once.
$this->solr->commit();
INDEX DOCUMENTS
3. </optimize>
Optimize, for performance improvement
$this->solr->optimize();
SOLR QUERY SYNTAXES
QUERY SYNTAXES (RDMS)
SELECT * FROM post WHERE (topic LIKE ‘%apache%’ OR author LIKE ‘%kabir%’)
OR (topic LIKE ‘%solr%’ OR author LIKE ‘%frank%’) ORDER BY id DESC
QUERY SYNTAXES (SOLR)
Topic:"The Right Way" AND author:WrongGuy
BOOSTING TERMS()
topic: "jakarta apache"^4 "Apache Lucene"
FUZZY SEARCH (SOLR)
topic:roam~ (similar in spelling roam)
matches foam roams, based on the Levenshtein Distance, or Edit Distance algorithm
PROXIMITY SEARCH (SOLR)
“jakarta apache”~10
search for a "apache" and "jakarta" within 10 words of each other in a document
SO, NOW, CAN I MAKE A MINI
GOOGLE?
YES, YOU CAN!
q Apache NUTCH, already there
q Open source, Web-search software project.
q Based on Solr...
INTERESTED? READ MORE… Ø http://lucene.apache.org/solr/ Ø http://wiki.apache.org/solr Ø http://lucene.apache.org/java/docs/
scoring.html
Ø http://lucene.apache.org/java/3_5_0/queryparsersyntax.html
Ø http://www.slideshare.net/erikhatcher/solr-search-at-the-speed-of-light http://www.slideshare.net/pittaya/using-apache-solr
WHO AM I… murshed ahmmad Khan head of development,
http://www.usamurai.com @usamurai email: [email protected]
THANKS…
Questions?