Real-time search in Drupal with Elasticsearch @Moldcamp

Post on 11-May-2015

389 views 0 download

Tags:

transcript

Elasticsearch

Flexible and powerful open source, distributed real-time search and analytics engine for the cloud

Why use Elasticsearch?

● RESTful API● Open Source● JSON over HTTP● based on Lucene● distributed● highly available● schema free● massively scalable

Setup in 2 steps:

1. Extract the archive2. > bin/elasticsearch

How to use it?

> curl -XGET localhost:9200/?pretty

> curl -XGET localhost:9200/?pretty

{"ok" : true,"status" : 200,"name" : "Infinity","version" : {

"number" : "0.90.1","snapshot_build" : false,"lucene_version" : "4.3"

},"tagline" : "You Know, for Search"

}

> curl -XGET localhost:9200/?pretty

action (verb)

> curl -XGET localhost:9200/?pretty

node + port

> curl -XGET localhost:9200/?pretty

path

> curl -XGET localhost:9200/?pretty

query string

Let's index some data

> PUT /index/type/id

Where?It's very similar to database in SQL

> PUT /index/type/id

What?Table

Content type,Entity type,

any kind of type you decide

> PUT /index/type/id

Which?Node ID,Entity ID,

any kind of serial ID

> PUT /mysite/node/1 -d

{"nid": "1","status": "1","title": "Hello elasticsearch","body": "First elasticsearch document"

}

> PUT /mysite/node/1 -d

{"nid": "1","status": "1","title": "Hello elasticsearch","body": "First elasticsearch document"

}

{"ok":true,"_index":"mysite","_type":"node","_id":"1","_version":1

}

Let's GET some data

> GET /mysite/node/1{ "_index" : "mysite", "_type" : "node", "_id" : "1", "_version" : 1, "exists" : true, "_source" : { "nid":"1", "status":"1", "title":"Hello elasticsearch", "body":"First elasticsearch document" }

> GET /mysite/node/1?fields=title,body

Get specific fields

> GET /mysite/node/1?fields=title,body

Get specific fields

> GET /mysite/node/1/_source

Get source only

Let's UPDATE some data

> PUT /mysite/node/1 -d

{"status":"0"

}

> PUT /mysite/node/1 -d

{"ok":true,"_index":"mysite","_type":"node","_id":"1","_version":2

}

{"status":"0"

}

UPDATE = DELETE + PUT

Let's DELETE some data

> DELETE /mysite/node/1

> DELETE /mysite/node/1

{"ok":true,"found":true,"_index":"mysite","_type":"node","_id":"1","_version":3

}

Distributed, Highly Available

> PUT /new_index -d '{ "settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 }}'

Concurrency, Version control

> PUT /myapp/node/1?version=1{ "title": "hi girl"}

> PUT /myapp/node/1?version=1{ "title": "hi girl"}

{ "_index": "myapp", "_type": "node", "_id": "1", "_version": 1, "created": false}

> PUT /myapp/node/1?version=1{ "title": "hey boy"}

# 200

> PUT /myapp/node/1?version=1{ "title": "hey boy"}

# 409

> version conflict, current [2], provided [1]

Let's SEARCH for something

> GET /_search

> GET /_search

{"took" : 32,"timed_out" : false,"_shards" : {

"total" : 20,"successful" : 20,"failed" : 0

},"hits" : { results... }

}

Let's SEARCH in multiple indices and types

> GET /index/_search

> GET /index/type/_search

> GET /index1,index2/_search

> GET /myapp_*/type, entity_*/_search

Let's PAGINATE results

> GET /_search?size=10&from=20

size = results per pagefrom = starting from

Let's search oldschool

> GET /_search?q=title:elasticsearch

> GET /_search?q=nid:60

+title:awesome +status:1 +created:[1369917354 TO *]

?q=title:awesome%20%2Bcreated:[1369917354%20TO%20*]%2Bstatus:1

+title:awesome +status:1 +created:[1369917354 TO *]

The ugly encoding =)

Query DSL style

> GET /_search -d

{"query": {

"match": "awesome"}

}

> GET /_search -d

{"query": {

"match" : { "title" : { "query" : "+awesome -poor", "boost" : 2.0, }}

}}

Mappings and types

Core types* string* number* date* boolean

Complex types* array type* object type* nested type

Others:ip typegeo pointgeo shapeattachments

Define type mapping

> PUT /myapp/node -d

{ "node" : { "properties" : { "message" : {

"type" : "string", "store" : true

} } }}

Indexed fields

Full text

analyzed

== is splitted into terms

Term

not analyzed

== is stored as is

> PUT /myapp/node -d

{ "node" : { "properties" : { "name" : {

"type" : "string", "store" : true,“index”: “not_analyzed”

} } }}

Dynamic mapping

Analysis and indexing

Inverted index

1. “The quick brown fox jumped over the lazy dog”

2. “Quick brown foxes leap over lazy dogs in summer”

Term Doc_1 Doc_2

-------------------------

Quick | | X

The | X |

brown | X | X

dog | X |

dogs | | X

fox | X |

foxes | | X

in | | X

jumped | X |

lazy | X | X

leap | | X

over | X | X

quick | X |

summer | | X

the | X |

Analyzer

Tokenizers

● standard● keyword● whitespace● ngram

TokenFilters

standardlowercasestoptruncatesnowball

> GET /_analyze?analyzer=standard -d 'this is a test baby'

{ "tokens" : [ { "token" : "test", "start_offset" : 10, "end_offset" : 14, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "baby", "start_offset" : 15, "end_offset" : 19, "type" : "<ALPHANUM>", "position" : 5 } ]}

Autocomplete fields

Queries & Filters

Queries & Filters

full text search

relevance score

heavy

not cacheable

exact match

show or hide

lightning fast

cacheable

Combine Filters & Queries

> GET /_search -d

{"query": {

"filtered": {"query": {

"match": { "title": "awesome" }},"filter": {

"term": { "type": "article" }}

} }

}

and Sorting

> GET /_search -d

{"query": {

"filtered": {"query": {

"match": { "title": "awesome" }},"filter": {

"term": { "type": "article" }}

} }"sort": {"date":"desc"}

}

Relevance. Explain API

Term frequencyHow often does the term appear in the field? The more often, the more relevant.

Inverse document frequency

How often does each term appear in the index? The more often, the less relevant. T

Field norm

How long is the field? The longer it is, the less likely it is that words in the field will be relevant.

and Facets

Facets on Amazon

> GET /_search -d

{"facets": {

"home_team": {"terms": {

"field": "field_home_team"}

}}

}

> GET /_search -d

{"facets": {

"home_team": {"terms": {

"field": "field_home_team"}

}}

}

Give your facet a name

> GET /_search -d

{"facets": {

"home_team": {"terms": {

"field": "field_home_team"}

}}

}

Your facet filter can be:

● Terms● Range● Histogram● Date Histogram● Filter● Query● Statistical● Terms Stats● Geo Distance

"facets" : { "home_team" : { "_type" : "terms", "missing" : 203, "total" : 100, "other" : 42, "terms" : [ { "term" : "hou", "count" : 8 }, { "term" : "sln", "count" : 6 }, ...

STOP! I want this in Drupal?

Development directions:

1. Search API implementation2. Field Storage API3. Alternative backends

Available modules:

Elasticsearch Elasticsearch ConnectorSearch API elasticsearch

Field Storage API implementation

Elasticsearch field storage sandbox by Damien TournoudStarted in July 2011

Field Storage API implementation

Elasticsearch field storage sandbox by Damien TournoudStarted in July 2011

Elasticsearch EntityFieldQuery sandbox https://drupal.org/sandbox/asgorobets/2073151

Let's DEMO

Let the Search be with you