Elasto Mania

Post on 15-Jan-2015

532 views 3 download

Tags:

description

A gentle introduction to Elasticsearch

transcript

.

.

.

elasto mania

@about_andrefs

2014

.

.

.

what is it?

...

.

Elasticsearch is a flexible and powerfulopen source, distributed, real-time searchand analytics engine.

elasticsearch.org/overview/

.

.

.

talk disclaimers

• introduction to ES (sorry, no heavy stuff)• focused on Elasticsearch itself (not so much

on integration with Kibana, Logstash, etc)• heavily based on Andrew Cholakian’s book

Exploring Elasticsearch• Tiririca method• not all disclaimers have necessarily been

disclaimed

.

.

.

gettingstarted

.

.

.

buzzword driven slide

• real time analytics• conflict management• per-operation

persistence• document oriented• build on top of

Apache Lucene™• Apache 2Open

Source License

• real time data• distributed• multi-tenancy• RESTful API• schema free• full text search• high availability

.

.

.

use cases...

.search a large number of product descriptions fora specific phrase and return the best results

...

.search for words that sound like a given word

...

.auto-complete a search boxwith previously searchissues and allowingmisspellings...

.storing large quantities of semi-structured (JSON)data in a distributed fashion, with redundancy

.

.

.

use cases...

.search a large number of product descriptions fora specific phrase and return the best results....search for words that sound like a given word

...

.auto-complete a search boxwith previously searchissues and allowingmisspellings...

.storing large quantities of semi-structured (JSON)data in a distributed fashion, with redundancy

.

.

.

use cases...

.search a large number of product descriptions fora specific phrase and return the best results....search for words that sound like a given word...

.auto-complete a search boxwith previously searchissues and allowingmisspellings

...

.storing large quantities of semi-structured (JSON)data in a distributed fashion, with redundancy

.

.

.

use cases...

.search a large number of product descriptions fora specific phrase and return the best results....search for words that sound like a given word...

.auto-complete a search boxwith previously searchissues and allowingmisspellings...

.storing large quantities of semi-structured (JSON)data in a distributed fashion, with redundancy

.

.

.

don’t use cases....calculate howmany items are le in an inventory

...

.figure out the sum of all items in a givenmonth’sinvoices...

.execute operations transactionally with rollbacksupport....guarantee item uniqueness across multiple fields

.

.

.

don’t use cases....calculate howmany items are le in an inventory...

.figure out the sum of all items in a givenmonth’sinvoices

...

.execute operations transactionally with rollbacksupport....guarantee item uniqueness across multiple fields

.

.

.

don’t use cases....calculate howmany items are le in an inventory...

.figure out the sum of all items in a givenmonth’sinvoices...

.execute operations transactionally with rollbacksupport

...

.guarantee item uniqueness across multiple fields

.

.

.

don’t use cases....calculate howmany items are le in an inventory...

.figure out the sum of all items in a givenmonth’sinvoices...

.execute operations transactionally with rollbacksupport....guarantee item uniqueness across multiple fields

.

.

.

history2004: Shay Bannon creates Compass (Java

search engine framework)2009: big parts of Compass would need to

be rewritten to release a third versionfocused on scalability

Feb 2010: Elasticsearch 0.4.0Mar 2012: Elasticsearch 0.19.0Apr 2013: Elasticsearch 0.90.0Feb 2014: Elasticsearch 1.0.0Mar 2014: Elasticsearch 1.1.0

.

.

.

the basics

.

.

.

JSON over HTTP

• primary data format for ES is JSON• main protocol consists of HTTP requests with

JSON payload• _id is unique, and generated automatically if

unassigned• internally, JSON is converted flat fields for

Lucene’s key/value API

.

.

.

mnemonic

relational DB Elasticsearchdatabase indextable type

schema definition mappingcolumn fieldrow document

elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html

.

.

.

documents

• like a row in a table in an RDB• JSON objects• each is stored in an index, has a type and an

id• each contains zero or more fields

.

.

.

sample document.PUT /music/songs/1..

.

{”_id” : 1,”title” : ”The Vampyre of Time and Memory”,”author” : ”Queens of the Stone Age”,”album” : {

”title” : ”...Like Clockwork”,”year” : 2013,”track” : 3,

},”genres” : [”alternative rock”,”piano rock”]

}

.

.

.

fields

• key-value pairs• value can be a scalar or a nested structure• each field has a type, defined in amapping

.

.

.

typestype definitionstring textinteger 32-bit integerslong 64-bit integersfloat IEEE floatsdouble double precision floatsboolean true or falsedate UTC Date/Timegeo_point latitude/longitudenull the value nullarray any fieldobject type ommited, properties fieldnested separate document

.

.

.

mapping

• defines the types of a document’s fields• and the way they are indexed• scopes _ids (documents with different types

may have identical _ids)• defines a bunch of index-wide settings• can be defined explicitly or automatically

when a document is indexed

.

.

.

sample mapping.PUT /music/songs/_mapping..

.

{”song” : {

”properties” : {”title” : { ”type” : ”string” },”author” : { ”type” : ”string” },”album” : {

”properties” : {”title” : { ”type” : ”string” },”year” : { ”type” : ”integer” },”number” : { ”type” : ”integer” }

}},”genres” : { ”type” : ”string” }

}}

}

.

.

.

indexes

• like a database in an RDB• has amappingwhich defines types• logical namespace• maps to one or more primary shards• can have zero or more replica shards

.

.

.

CRUD I.PUT /music....PUT /music/songs/_mapping..

.

{”song” : {

”properties” : {...

}}

}

.

.

.

CRUD II.PUT /music/songs/1..

.

{”title” : ”The Vampyre of Time and Memory”,...

}.GET /music/songs/1....POST /music/songs/1/_update...{ ”doc” : { ”year” : 2014 }}.DELETE /music/songs/1...

.

.

.

search

.

.

.

search fundamentals

1. boolean search2. scoring

.

.

.

ES Search APIIncludes:• Query DSL• Filter API• Facet API• Sort API• …

...

.

• /index/_search• /index/type/_search

.

.

.

filters

filtered queries: nested in the query field; affectboth query results and facet counts

top-level filters: specified at the root of search,will only affect queries

facet level filters: pre-filters data before beingaggregated, only affects one specificfacet

.

.

.

search sample I

.POST /music/_search..

.

{ ”query” : {”fuzzy” : { ”title” : ”vampires” }

}}

.

.

.

search sample II.POST /planet/_search..

.

{”from” : 0,”size” : 15,”query” : { ”match_all” : {} },”sort” : { ”handle” : ”desc” },”filter” : { ”term” : { ”_all” : ”coding” }},”facets” : {

”hobbies” : {”terms” : { ”field” : ”hobbies” }

}}

}

.

.

.

analysis

• performedwhen documents are added• manipulates data to ensure better indexing• 3 steps:

1. character filtering2. tokenization3. token filtering

• distinct analyzers for each field• multiple analyzers for each field• custom analyzers

.

.

.

analyzers.PUT /music/songs/_mapping..

.

{ ”song” : { ”properties” : {”title” : {

”type” : ”string”,”fields” : {

”title_exact” : { ”type” : ”string”,”index” : ”not_analyzed” },

”title_simple”: { ”type” : ”string”,”analyzer”: ”simple” },

”title_snow” : { ”type” : ”string”,”analyzer”: ”snowball” }

}},...

}}}

.

.

.

highlighting.POST /publications/books/_search..

.

{”query” : {

”match” : { ”text” : ”spaceship” }},”fields” : [”title”, ”isbn”],”highlight” : {

”fields” : {”text” : { ”number_of_fragments” : 3 }

}}

}

.

.

.

search phrases.POST /publications/books/_search..

.

{”query” : {

”match_phrase” : { ”text” : ”laser beam” }},”fields” : [”title”, ”isbn”],”highlight” : {

”fields” : {”text” : { ”number_of_fragments” : 3 }

}}

}

.

.

.

going wild

.

.

.

aggregations

Unit of work that builds analytic information over aset of documents.bucketing..

.Documents are evaluated and placed into bucketsaccording to previously defined criteria.metric..

.Keep track of metrics which are computed over aset of documents

.

.

.

percolations

.

.

.

more stuff

• routing• uri search• suggesters• count API• validate API• explain API• more like this API• …

.

.

.

scalability

.

.

.

tools

.

.

.

Logstash

.

.

.

Kibana

.

.

.

Marvel

.

.

.

what aboutnow

.

.

.

new features...2014..

.

Apr 3rd: countMar 6th: Tribe nodesJan 17th: the cat APIJan 29th: MarvelJan 21th: snapshot & restore.2013..

.

Sep 24th: official Elasticsearch clients for Ruby,Python, PHP and Perl

Nov 28th: Lucene 4.x doc values…:

.

.

.

go read a book

• Exploring Elasticsearch, Andrew Cholakian• Elasticsearch – The Definitive Guide,

Clinton Gormley, Zachary Tong

.

.

.

getting in touch

• https://github.com/elasticsearch• @elasticsearch• irc.freenode.org #elasticsearch• irc.perl.org #elasticsearch• http://www.elasticsearch.org/blog/• Elasticsearch User mailing list

.

.

.

references

• Elastic SearchMegaManual• http://solr-vs-elasticsearch.com/• Elastic Search in Production• Exploring Elasticsearch, Andrew Cholakian• Elasticsearch – The Definitive Guide,

Clinton Gormley, Zachary Tong

.

.

.

job’s done

questions?