Elastic search Walkthrough

Post on 26-Jan-2015

124 views 2 download

description

Elastic search Walkthrough

transcript

Elastic Search

Concepts• Elastic search is an open source(Apache 2), Distributed, RESTful,

Search Engine built on top of Apache Lucene• Schema Free & Document Oriented• Support JSON Model• Elastic Search allows you to completely control how a JSON

document gets mapped into the search on a per type and per index level.

• Multi Tenancy – Support for more than one index, support for more than one type per index

• Distributed Nature - Indices are broken down into shards, each shard with 0 or more replicas

• In RDBMS terms, index corresponds to database, type corresponds to table, a document corresponds to a table row and a field corresponds to a table column.

Create Index

• The create index API allows to instantiate an index• Curl Example for making Sales index (index name should be in lowercase)

$ curl -XPOST 'http://localhost:9200/sales/‘ • Each index created can have specific settings associated with it. Following example

create index sales with 3 shards, each with 2 replicascurl - XPOST 'http://localhost:9200/sales/' -d '{

"settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 }

}‘• Reference link : http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html

Mapping• Mapping is the process of defining how a document should be mapped to

the Search Engine• If no mapping is defined, elasticsearch will guess the kind of the data and

map it.• In ES, an index may store documents of different “mapping types”• The put mapping API allows to register specific mapping definition for a

specific type. Example – mapping for Order type curl -XPOST 'http://localhost:9200/sales/order1/_mapping' -d '{ "order1": { "properties": { "entity_id":{"type":"integer"}, "increment_id":{"type":"string","index":"not_analyzed"}, "status":{"type":"string"} } } }‘

Mapping• Get Mapping available in index. Following curl examples returned all the type and its associate mapping available in sales index curl –XGET ‘localhost:9200/sales/_mapping?pretty=1’

• Get Mapping of type curl – XGET‘localhost:9200/sales/order1/_mapping?pretty=1’

• Reference link : http://www.elasticsearch.org/guide/reference/mapping/index.html

Add document• The following example inserts the JSON document into the

“sales” index, under a type called “order1” with an id of 1: curl -XPOST 'http://localhost:9200/sales/order1/1' -d ' {

"entity_id":1, "increment_id":"1000001",“status":"shipped",

}'

• Reference link:http://www.elasticsearch.org/guide/reference/api/index_.html

GET API (Get data)• The get API allows to get a typed JSON document from the

index based on its id. The following example gets a JSON document from an index called sales, under a type called order1, with id valued 1:

curl -XGET 'localhost:9200/sales/order1/1?pretty=1'

• The get operation allows to specify a set of fields that will be returned by passing the fields parameter. For example:

curl -XGET 'localhost:9200/sales/order/1?fields=entity_id?pretty=1‘

• Reference link :http://www.elasticsearch.org/guide/reference/api/get.html

• For Multi Get Api

http://www.elasticsearch.org/guide/reference/api/multi-get.html

Search API (Search data)• The search API allows to execute a search query and get back search hits that

match the query. Following query returns the document which have entity_id 1 curl -XGET 'http://localhost:9200/sales/order1/_search' -d '{ "query" : { "term" : { “entity_id" : 1 } }}'

• The additional parameter for search API are from, size, search_type, sort,fields etc.

curl -XGET 'http://localhost:9200/sales/order1/_search' -d '{ "query" : { "term" : {"status" : "confirmed" } },

"from" :0, "size" :1,"sort" :[{"entity_id" : "desc"],"fields":["entity_id","increment_id"] } ‘• Reference Link :

http://www.elasticsearch.org/guide/reference/api/search/request-body.html

Multi - Search API (Search data)• The search API can be applied to multiple types within an index, and

across multiple indices with support for the multi index syntax. For example, we can search on all documents across all types within the sales index:

curl -XGET 'http://localhost:9200/sales/_search?q=status:confirmed‘

• We can also search within specific types: curl -XGET 'http://localhost:9200/sales/order,order1/_search?q=status:confirmed‘

• We can also search all orders with a certain field across several indices:

curl -XGET 'http://localhost:9200/sales,newsales/order1/_search?q=entity_id:1‘

• we can search all orders across all available indices using _all placeholder:

curl - XGET 'http://localhost:9200/_all/order1/_search?q=entity_id:1‘

• even search across all indices and all types: curl -XGET 'http://localhost:9200/_search?q=entity_id:1'

Update API

• The update API allows to update a document based on a script provided. Following example update the status field of document which has id 1 with new value.

curl -XPOST 'localhost:9200/sales/order1/1/_update' -d '{ "script" : "ctx._source.status= newStatus", "params" : { "newStatus" : " confirmed" }}‘

• We can also add a new field to the document:curl -XPOST 'localhost:9200/sales/order1/1/_update' -d '{ "script" : "ctx._source.newField = \"new field intoduced\""}‘

• We can also remove a field from the document:curl -XPOST 'localhost:9200/sales/order1/1/_update' -d '{ "script" : "ctx._source.remove(\"newField\")"}‘

• Reference link :http://www.elasticsearch.org/guide/reference/api/update.html

Delete API• The delete API allows to delete a typed JSON document from a

specific index based on its id. The following example deletes the JSON document from an index called sales, under a type called order1, with id valued 1:

curl -XDELETE 'http://localhost:9200/sales/order1/1‘

• Delete entire typecurl -XDELETE 'http://localhost:9200/sales/order1‘

• The delete by query API allows to delete documents from one or more indices and one or more types based on a query:

curl -XDELETE 'http://localhost:9200/sales/order1/_query?q=entity_id:1‘ curl -XDELETE 'http://localhost:9200/sales/_query?q=entity_id:1'

curl -XDELETE 'http://localhost:9200/sales/order1/_query' -d '{ "term" : { “status" : “confirmed" }} '

Count API• The count API allows to easily execute a query and get the number of matches

for that query. It can be executed across one or more indices and across one or more types.

curl -XGET 'http://localhost:9200/sales/order/_count' -d '{"term":{"status":"confirmed"}} '

curl -XGET 'http://localhost:9200/_count' -d '{"term":{"status":"confirmed"}} '

curl -XGET 'http://localhost:9200/sales/order,order1/_count' -d '{"term":{"status":"confirmed"}} '

• Reference Link :http://www.elasticsearch.org/guide/reference/api/count.html

Facet Search• Facets provide aggregated data based on a search query.• A terms facet can return facet counts for various facet values for a

specific field. ElasticSearch supports more facet implementations, such as range, statistical or date histogram facets.

• The field used for facet calculations must be of type numeric, date/time or be analyzed as a single token.

• You can give the facet a custom name and return multiple facets in one request.

• Now, let’s query the index for products which has category id 3 and retrieve a terms facet for the brands field. We will name the facet simply: Brands (Example of facet terms)

curl -XGET 'localhost:9200/category/products/_search?pretty=1' -d '{ "query": {"term":{"category_id":3} }, "facets": { "Brands": {"terms":{"fields":["brands"],"size":10,"order":"term"}} }} '

• Reference link: http://www.elasticsearch.org/guide/reference/api/search/facets/ http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

Facet search• Range facet allows to specify a set of ranges and get both the number of docs (count) that

fall within each range, and aggregated data either based on the field, or using another field.curl -XGET 'localhost:9200/sales/order/_search?pretty=1' -d '{

"query" : {"term" : {"status" : "confirmed"} }, "facets" : { "range1" : { "range" : { "grand_total" : [ { "to" : 50 }, { "from" : 20, "to" : 70 }, { "from" : 70, "to" : 120 }, { "from" : 150 } ] } } },

"sort":[{"entity_id":"asc"}]}'

• Reference link : http://www.elasticsearch.org/guide/reference/api/search/facets/range-facet.html

Elastica• Elastica is an Open Source PHP client for the elasticsearch search

engine/database. • Reference Link : http://www.elastica.io/en• To use Elastica, Download and Include Elastica in a project using PHP autoload.

function __autoload_elastica ($class) { $path = str_replace('_', '/', $class); if (file_exists('/xampp/htdocs/project/Elastica/lib/' . $path . '.php')) {require_once('/xampp/htdocs/project/Elastica/lib/' . $path . '.php'); }}spl_autoload_register('__autoload_elastica');

• Connecting to ElasticSearch: On single node : $elasticaClient- = new Elastica_Client(array('host' => '192.168.0.27','port' => '9200'));

• It is quite easy to start a elasticsearch cluster simply by starting multiple instances of elasticsearch on one server or on multiple servers. One of the goals of the distributed search index is availability. If one server goes down, search results should still be served.

$elasticaClient- = new Elastica_Client('servers'=>array(array('host' => '192.168.0.27','port' => '9200'), array('host' => '192.168.0.27','port' => '9201')));

Elastica• Create Index :

$elasticaClient- = new Elastica_Client(array('host' => '192.168.0.27','port' => '9200')); $elasticaIndex = $elasticaClient->getIndex(‘sales'); $elasticaIndex->create(array( 'number_of_shards' => 4, 'number_of_replicas' => 1), true);

• Define Mapping : $mapping = new Elastica_Type_Mapping(); $elasticaIndex = $elasticaClient- >getIndex('sales'); $elasticaType = $elasticaIndex->getType('order'); $mapping->setType($elasticaType);

$mapping->setProperties(array('entity_id' => array('type' => 'integer'),'increment_id' => array('type' => 'string',"index" => "not_analyzed"),‘status' =>array('type'=>'string',"index" => "not_analyzed")

));$mapping->send();

Elastica Add documents

$elasticaClient- = new Elastica_Client(array('host' => '192.168.0.27','port' => '9200'));

$elasticaIndex = $elasticaClient ->getIndex('sales'); $elasticaType = $elasticaIndex->getType('order');

// The Id of the document$id = 1;

// Create a document$record = array('entity_id'=>1,

‘increment_id'=>‘100001',‘status'=>‘confirmed');

$recordDocument = new Elastica_Document($id, $record);

// Add record to type$elasticaType->addDocument($ recordDocument );

// Refresh Index$elasticaType->getIndex()->refresh();

Elastica Get Document

$elasticaClient- = new Elastica_Client(array('host' => '192.168.0.27','port' => '9200'));

$index = $elasticaClient->getIndex('sales'); //get index$type = $index->getType('order'); //get type$Doc = $type->getDocument($id)->getData(); //get data

Elastica Update Document$elasticaClient- = new Elastica_Client(array('host' => '192.168.0.27','port' => '9200')); $index = $elasticaClient->getIndex('sales'); //get index$type = $index->getType('order'); //get type$id = 1; //id of document which need to be updated$newVal = 'confirmed'; //value to be updated$update = new Elastica_Script("ctx._source.status = newval", array('newval' => $newVal)); $res=$type->updateDocument($id,$update);if(!empty($res)){ $val=$res->getData(); if($val['ok']) {

echo "updated"; } else {

echo “value not updated"; }}else{

echo “value not updated";}

Elastica Search Documents• The search API allows to execute a search query and get back search

hits that match the query.• Search API consists following major methods:

– Query String– Term– Terms– Range– Bool Query– Filter (it also contain Filter_term, Filter_Range etc)– Facets (it contain Facet_Range, Facet_Terms,Facet_Filter,

Facet_Query, Facet_statistical etc.)– Query (where we can set fields for output, limit , sorting)

Search Documents – Query String$elasticaClient = new Elastica_Client(array('host' => '192.168.0.27','port' => '9200')); $elasticaIndex = $elasticaClient->getIndex('sales');$elasticaType = $elasticaIndex->getType('order');

$elasticaQueryString = new Elastica_Query_QueryString();$elasticaQueryString->setQuery((string) “shipped*");$elasticaQueryString->setFields(array(‘status')); //we can set 1 or more than 1 field in query string

$elasticaQuery = new Elastica_Query();$elasticaQuery->setQuery($elasticaQueryString);$elasticaQuery->setFields(array('increment_id','entity_id','billing_name','grand_total')); $elasticaQuery->setFrom(0);$elasticaQuery->setLimit(20);$sort = array("entity_id" => "desc");$elasticaQuery->setSort($sort);

$elasticaResultSet = $elasticaType->search($elasticaQuery);$totalResults = $ elasticaResultSet ->getTotalHits();$elasticaResults = $elasticaResultSet ->getResults();foreach ($elasticaResults as $elasticaResult) {

print_r($elasticaResult->getData());}

Search Documents – Query Term$elasticaQueryTerm = new Elastica_Query_Term();$elasticaQueryTerm->setTerm('entity_id',1);

$elasticaQuery = new Elastica_Query();$elasticaQuery->setQuery($elasticaQueryTerm);$elasticaQuery->setFields(array('increment_id','entity_id','billing_name','grand_total')); $elasticaQuery->setFrom(0);$elasticaQuery->setLimit(20);$sort = array("entity_id" => “asc");$elasticaQuery->setSort($sort);

$elasticaResultSet = $elasticaType->search($elasticaQuery);

Search Documents – Query Terms$elasticaQueryTerms = new Elastica_Query_Terms();

//for query terms, you can specify 1 or more than 1 value per field

$elasticaQueryTerms->setTerms('entity_id', array(1,2,3,4,5));$elasticaQueryTerms->addTerm(6);

$elasticaQuery = new Elastica_Query();$elasticaQuery->setQuery($elasticaQueryTerms);$elasticaQuery->setFields(array('increment_id','entity_id','billing_name','grand_total')); $elasticaQuery->setFrom(0);$elasticaQuery->setLimit(20);$sort = array("entity_id" => “asc");$elasticaQuery->setSort($sort);

$elasticaResultSet = $elasticaType->search($elasticaQuery);

Search Documents – Query Range

$elasticaQueryRange = new Elastica_Query_Range();

//for range query , you can specify from, from & to or to only

$elasticaQueryRange->addField('entity_id', array('from' => 10,"to"=>14));

$elasticaQuery = new Elastica_Query();$elasticaQuery->setQuery($elasticaQueryRange);$elasticaQuery->setFields(array('increment_id','entity_id','billing_name','grand_total')); $elasticaQuery->setFrom(0);$elasticaQuery->setLimit(20);$sort = array("entity_id" => “asc");$elasticaQuery->setSort($sort);

$elasticaResultSet = $elasticaType->search($elasticaQuery);

Search Documents – Bool Query• The bool query maps to Lucene BooleanQuery• Bool Query contains clause Occurrence – must, should, must_not

$boolQuery = new Elastica_Query_Bool();

$elasticaQueryString = new Elastica_Query_QueryString(); $elasticaQueryString ->setQuery(‘shoh*');

$elasticaQueryString->setFields(array('‘billing_name, ‘shipping_name')); $boolQuery->addMust($elasticaQueryString);

$elasticaQueryTerm = new Elastica_Query_Term();$elasticaQueryTerm->setTerm('entity_id',1);$boolQuery->addMust($elasticaQueryTerm );

$elasticaQuery = new Elastica_Query();$elasticaQuery->setQuery($boolQuery);$elasticaResultSet = $elasticaType->search($elasticaQuery);

Search Documents – Query Filters• When doing things like facet navigation, sometimes only the hits are needed to be filtered by

the chosen facet, and all the facets should continue to be calculated based on the original query. The filter element within the search request can be used to accomplish it.

$elasticaQueryString = new Elastica_Query_QueryString(); $elasticaQueryString->setQuery('*'); $elasticaQueryString->setFields(array('increment_id'));

$filteredQuery = new Elastica_Query_Filtered($elasticaQueryString,new Elastica_Filter_Range('created_at', array('from' => '2011-01-04 07:36:00','to' => '2013-01-04 19:36:25')));

$elasticaQuery = new Elastica_Query(); $elasticaQuery->setQuery($filteredQuery); $elasticaResultSet = $elasticaType->search($elasticaQuery);

Elastica - Facet Terms$elasticaQuery = new Elastica_Query();$elasticaQuery->setQuery($boolQuery); //set main query$facet = new Elastica_Facet_Terms('status Facet');$facet->setField('status');$facet->setOrder(‘term'); //another options are reverse_term,count,reverse_count$facet->setSize(5);

$elasticaQuery->addFacet($facet); //adding facet to query$elasticaResultSet = $elasticaType->search($elasticaQuery);

$facets = $ elasticaResultSet ->getFacets(); //get facets dataforeach($facets as $k=>$v){

if(isset($v['terms']) && is_array($v['terms'])){ $data['facets'][$k]=$v['terms'];}

}