Date post: | 15-Jan-2015 |
Category: |
Technology |
Upload: | ritesh-puthran |
View: | 4,218 times |
Download: | 5 times |
Introduction to Sphinx .
Sphinx Searching and Sorting Features.
Sphinx Implementation.
Demo.
Introduction to Sphinx .
Sphinx Searching and Sorting Features.
Sphinx Implementation.
Demo.
Open Source Search Engine.
Developed by Andrew Aksyonoff
Integrates well with MySQL.
Provides greatly improved full-text search.
Specially designed for indexing databases.
1. Search on 500 MB of docs.
2. Docs are 3,000.000 in count.
3. Looking for “internet web design (match any)”.
4. Returning 134.000 docs.
It has Two standalone programs:o Indexer – Pulls data from DB, builds indexes.o Searchd- Uses indexes and answers queries.
Clients interact with searchd through:o Via native API’s: PHP, Python, Perl, Ruby, and Java.o Via SphinxSE.
Indexer periodically rebuilds the indexes:o Typically using cron jobs.o Searching works ok during rebuilds (Live Updates).
Sphinx documents = Records in DB.
I.Document = It just like ROW in DB and it has its own UNIQUE ID.
II.Each Document comprises of Fields and Attributes.
III.Fields are the columns on which we want to search.
IV.Attributes may be used for filtering, sorting, grouping.
1.Sphinx Search Engine Returns only Unique Document ID’s.
2.This means if we Search for Dominos we get corresponding rows
UNIQUE ID possessing it.
3. Hence after searching returns results, you will still likely NEED TO FETCH DETAILS of documents in your FINAL RESULT PAGE.
Introduction to Sphinx .
Sphinx Searching and Sorting Features.
Sphinx Implementation.
Demo.
SELECT id
FROM sphinx_table
WHERE
query=‘dominos; -- thing which you want to search
mode = ext2; -- searching mode
weights = 1000,100,10; --weight distribution
sort = attr_asc:group_id;’; --sorting type
SPH_MATCH_ALL : match all keywords.
SPH_MATCH_ANY : match any keywords.
SPH_MTACH_BOOLEAN : no relevance, implicit Boolean AND between keywords
if not specified otherwise.
1. hello & world
2. hello | world
3. hello –world
SPH_MATCH_PHRASE : treats query as a phrase and requires a perfect match.
SPH_MATCH_EXTENDED : this has been super ceded by SPH_MATCH_EXTENDED2.
SPH_MATCH_EXTENDED2 : it provide varied functionalities.
FIELD SEARCH OPERATOR : @title hello @body world.
QUORUM MATCHING OPERATOR : “world is wonderful place”/3.
PROXIMITY SEARCH OPERATOR : “hello world”~10.
STRICT ORDER OPERATOR : black << cat
Phrase Ranking : Higher preference to Documents possessing matching phrase like “hello world”.
Statistical Ranking : Here more preference is giving to word frequency i.e.
Document containing more number of “hello” and/or “world” is given more weightage.
SPH_MATCH_BOOLEAN : No weighting performed.
SPH_MATCH_ALL and SPH_MATCH_PHRASE : Uses Phrase Ranking.
SPH_MATCH_ANY : Phrase ranks * Big value + Statistical ranking ( Here we multiply with big value to guarantee higher phrase rank even if it’s field weight is low ).
SPH_MATCH_EXTENDED : (Phrase Rank + BM25)*1000.
Personalized Weighting : This can be done using “weights “ keyword in your Sphinx Query. This is generally used in the case when we want more preference between column to be searched. E.g. weights = 1,2,3; --this possible in mode=ext2.
SPH_SORT_RELEVANCE : Sorts by Relevance in DESC order.
SPH_SORT_ATTR_DESC : Sorts by an Attribute in DESC order.
SPH_SORT_ATTR_ASC : Sorts by an Attribute in ASC order.
SPH_SORT_TIME_SEGMENTS : Sorts by (hour/day/week/month) in DESC order.
SPH_SORT_EXTENDED : Here we can SPECIFY the COLUMNS on which we are applying our SEARCH for KEYWORDS for sorting order.
SPH_SORT_EXPR : Allows sorting using a mathematical equation involving column.
Introduction to Sphinx .
Sphinx Searching and Sorting Features.
Sphinx Implementation.
Demo.
Installation is usually straightforward :
REQUIREMENT:
A Good working C++ compiler.A Good Make Program.
STEPS: $./configure - - prefix /path - -with-mysql - - with-pgsql $make $make install
Checking SphinxSE Installation
There are 2 components that we need to setup before Sphinx is ready for searching:
•Sphinx Table
•Configuration File (e.g.: file_name.conf )
Requirements:
1.The data types of the first 3 columns must be INT,INT,VARCHAR.
which will be mapped to document id, match weight and the search query.
2.Query column must be indexed and no other column must be indexed.
3.All other attributes in the source comes as columns.
CREATE TABLE sphinx_table
(
id int not null,
Weight int not null,
Query varchar(255) not null,
Key (query)
)ENGINE=SPHINX CONNECTION=‘sphinx://localhost:3313/city_search_cust_mess’
Now in a Configuration File there are 4 section to configure which are as follows:
• Source (multiple)
• Index (multiple)
• Indexer
• Searchd
Now in a Configuration File there are 4 section to configure which are as follows:
• Source (multiple)
• Index (multiple)
• Indexer
• Searchd
Following are some of the options available in the source section of the configuration file:
TYPE:
type: data source type.
possible options: mysql,pgsql,xmlpipe,xmlpipe2.
Connection Info:
sql_host : SQL server host to connect (Mandatory).
sql_port : SQL server IP to connect ( Default 3306).
sql_user : SQL user to use when connecting to sql_host (Mandatory).
sql_pass : SQL user password to use when connecting to sql_host (Mandatory).
sql_db : SQL DB to be used.
sql_sock : socket name to connect to for local SQL servers.
Queries Info:
mysql_query_pre : pre-fetch query , or pre-query.
eg: sql_query_pre= SET NAMES utf8
sql_query : main document fetch query.
sql_query_post : Post-fetch query.
e.g.: sql_query_post= DROP TABLE my_tmp_table
sql_query_info : Document info query. (similar to comment in MySQL)
Attributes Info:
sql_attr_xxx: attribute declaration.(xxx : uint,bigint,float,str2ordinal,timestamp).
Now in a Configuration File there are 4 section to configure which are as follows:
• Source (multiple)
• Index (multiple)
• Indexer
• Searchd
type: index type .optional (possible option: local , distributed)
source: adds document source to local index. Multi-value.
path: Index files path and file name (without extension).
docinfo : Document attribute values ( inline , extern ) storage mode.
mlock : Memory locking for cached data . (Optional default 0).
min_word_len: minimum indexed word length (optional default 1).
Charset type: character set encoding type
Stemming Options:
morphology : A list of morphology preprocessors to apply.
e.g.: cars = car ; running =run.
Stopwords : stopwords file list (space seperated).
e.g.: the,is,are,an,a,etc….
Now in a Configuration File there are 4 section to configure which are as follows:
• Source (multiple)
• Index (multiple)
• Indexer
• Searchd
mem_limit : Indexing RAM usage limit . Optional, default is 32MB.
max_iops: maximum i/o operations per second.
max_iosize: maximum allowed i/o operation size.
Setting Configuration File: Indexer Section
Now in a Configuration File there are 4 section to configure which are as follows:
• Source (multiple)
• Index (multiple)
• Indexer
• Searchd
address: IP address to bind on default 0.0.0.0 listens to all interfaces.
port : searchd TCP port number. (mandatory, default is 3312).
log : log file name. (optional, default is empty).
query_log : query log file name . (optional , default is empty).
pid file : searchd process ID file name (mandatory).
max_matches: maximum amount of matches that the daemon keep in RAM for each index and can return to the client. (optional, default 1000)
preopen_indexes: whether to forcibly preopen all indexes on startup.(optional , default 0 i.e. don’t open).
Setting Configuration File: Searchd Section
Introduction to Sphinx .
Sphinx Searching and Sorting Features.
Sphinx Implementation.
Demo.