+ All Categories
Home > Technology > Elasto Mania

Elasto Mania

Date post: 15-Jan-2015
Category:
Upload: andrefsantos
View: 532 times
Download: 3 times
Share this document with a friend
Description:
A gentle introduction to Elasticsearch
Popular Tags:
53
. . elasto mania @about_andrefs 2014
Transcript
Page 1: Elasto Mania

.

.

.

elasto mania

@about_andrefs

2014

Page 2: Elasto Mania
Page 3: Elasto Mania

.

.

.

what is it?

...

.

Elasticsearch is a flexible and powerfulopen source, distributed, real-time searchand analytics engine.

elasticsearch.org/overview/

Page 4: Elasto Mania

.

.

.

talk disclaimers

• introduction to ES (sorry, no heavy stuff)• focused on Elasticsearch itself (not so much

on integration with Kibana, Logstash, etc)• heavily based on Andrew Cholakian’s book

Exploring Elasticsearch• Tiririca method• not all disclaimers have necessarily been

disclaimed

Page 5: Elasto Mania

.

.

.

gettingstarted

Page 6: Elasto Mania

.

.

.

buzzword driven slide

• real time analytics• conflict management• per-operation

persistence• document oriented• build on top of

Apache Lucene™• Apache 2Open

Source License

• real time data• distributed• multi-tenancy• RESTful API• schema free• full text search• high availability

Page 7: Elasto Mania

.

.

.

use cases...

.search a large number of product descriptions fora specific phrase and return the best results

...

.search for words that sound like a given word

...

.auto-complete a search boxwith previously searchissues and allowingmisspellings...

.storing large quantities of semi-structured (JSON)data in a distributed fashion, with redundancy

Page 8: Elasto Mania

.

.

.

use cases...

.search a large number of product descriptions fora specific phrase and return the best results....search for words that sound like a given word

...

.auto-complete a search boxwith previously searchissues and allowingmisspellings...

.storing large quantities of semi-structured (JSON)data in a distributed fashion, with redundancy

Page 9: Elasto Mania

.

.

.

use cases...

.search a large number of product descriptions fora specific phrase and return the best results....search for words that sound like a given word...

.auto-complete a search boxwith previously searchissues and allowingmisspellings

...

.storing large quantities of semi-structured (JSON)data in a distributed fashion, with redundancy

Page 10: Elasto Mania

.

.

.

use cases...

.search a large number of product descriptions fora specific phrase and return the best results....search for words that sound like a given word...

.auto-complete a search boxwith previously searchissues and allowingmisspellings...

.storing large quantities of semi-structured (JSON)data in a distributed fashion, with redundancy

Page 11: Elasto Mania

.

.

.

don’t use cases....calculate howmany items are le in an inventory

...

.figure out the sum of all items in a givenmonth’sinvoices...

.execute operations transactionally with rollbacksupport....guarantee item uniqueness across multiple fields

Page 12: Elasto Mania

.

.

.

don’t use cases....calculate howmany items are le in an inventory...

.figure out the sum of all items in a givenmonth’sinvoices

...

.execute operations transactionally with rollbacksupport....guarantee item uniqueness across multiple fields

Page 13: Elasto Mania

.

.

.

don’t use cases....calculate howmany items are le in an inventory...

.figure out the sum of all items in a givenmonth’sinvoices...

.execute operations transactionally with rollbacksupport

...

.guarantee item uniqueness across multiple fields

Page 14: Elasto Mania

.

.

.

don’t use cases....calculate howmany items are le in an inventory...

.figure out the sum of all items in a givenmonth’sinvoices...

.execute operations transactionally with rollbacksupport....guarantee item uniqueness across multiple fields

Page 15: Elasto Mania

.

.

.

history2004: Shay Bannon creates Compass (Java

search engine framework)2009: big parts of Compass would need to

be rewritten to release a third versionfocused on scalability

Feb 2010: Elasticsearch 0.4.0Mar 2012: Elasticsearch 0.19.0Apr 2013: Elasticsearch 0.90.0Feb 2014: Elasticsearch 1.0.0Mar 2014: Elasticsearch 1.1.0

Page 16: Elasto Mania

.

.

.

the basics

Page 17: Elasto Mania

.

.

.

JSON over HTTP

• primary data format for ES is JSON• main protocol consists of HTTP requests with

JSON payload• _id is unique, and generated automatically if

unassigned• internally, JSON is converted flat fields for

Lucene’s key/value API

Page 18: Elasto Mania

.

.

.

mnemonic

relational DB Elasticsearchdatabase indextable type

schema definition mappingcolumn fieldrow document

elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html

Page 19: Elasto Mania

.

.

.

documents

• like a row in a table in an RDB• JSON objects• each is stored in an index, has a type and an

id• each contains zero or more fields

Page 20: Elasto Mania

.

.

.

sample document.PUT /music/songs/1..

.

{”_id” : 1,”title” : ”The Vampyre of Time and Memory”,”author” : ”Queens of the Stone Age”,”album” : {

”title” : ”...Like Clockwork”,”year” : 2013,”track” : 3,

},”genres” : [”alternative rock”,”piano rock”]

}

Page 21: Elasto Mania

.

.

.

fields

• key-value pairs• value can be a scalar or a nested structure• each field has a type, defined in amapping

Page 22: Elasto Mania

.

.

.

typestype definitionstring textinteger 32-bit integerslong 64-bit integersfloat IEEE floatsdouble double precision floatsboolean true or falsedate UTC Date/Timegeo_point latitude/longitudenull the value nullarray any fieldobject type ommited, properties fieldnested separate document

Page 23: Elasto Mania

.

.

.

mapping

• defines the types of a document’s fields• and the way they are indexed• scopes _ids (documents with different types

may have identical _ids)• defines a bunch of index-wide settings• can be defined explicitly or automatically

when a document is indexed

Page 24: Elasto Mania

.

.

.

sample mapping.PUT /music/songs/_mapping..

.

{”song” : {

”properties” : {”title” : { ”type” : ”string” },”author” : { ”type” : ”string” },”album” : {

”properties” : {”title” : { ”type” : ”string” },”year” : { ”type” : ”integer” },”number” : { ”type” : ”integer” }

}},”genres” : { ”type” : ”string” }

}}

}

Page 25: Elasto Mania

.

.

.

indexes

• like a database in an RDB• has amappingwhich defines types• logical namespace• maps to one or more primary shards• can have zero or more replica shards

Page 26: Elasto Mania

.

.

.

CRUD I.PUT /music....PUT /music/songs/_mapping..

.

{”song” : {

”properties” : {...

}}

}

Page 27: Elasto Mania

.

.

.

CRUD II.PUT /music/songs/1..

.

{”title” : ”The Vampyre of Time and Memory”,...

}.GET /music/songs/1....POST /music/songs/1/_update...{ ”doc” : { ”year” : 2014 }}.DELETE /music/songs/1...

Page 28: Elasto Mania

.

.

.

search

Page 29: Elasto Mania

.

.

.

search fundamentals

1. boolean search2. scoring

Page 30: Elasto Mania

.

.

.

ES Search APIIncludes:• Query DSL• Filter API• Facet API• Sort API• …

...

.

• /index/_search• /index/type/_search

Page 31: Elasto Mania

.

.

.

filters

filtered queries: nested in the query field; affectboth query results and facet counts

top-level filters: specified at the root of search,will only affect queries

facet level filters: pre-filters data before beingaggregated, only affects one specificfacet

Page 32: Elasto Mania

.

.

.

search sample I

.POST /music/_search..

.

{ ”query” : {”fuzzy” : { ”title” : ”vampires” }

}}

Page 33: Elasto Mania

.

.

.

search sample II.POST /planet/_search..

.

{”from” : 0,”size” : 15,”query” : { ”match_all” : {} },”sort” : { ”handle” : ”desc” },”filter” : { ”term” : { ”_all” : ”coding” }},”facets” : {

”hobbies” : {”terms” : { ”field” : ”hobbies” }

}}

}

Page 34: Elasto Mania

.

.

.

analysis

• performedwhen documents are added• manipulates data to ensure better indexing• 3 steps:

1. character filtering2. tokenization3. token filtering

• distinct analyzers for each field• multiple analyzers for each field• custom analyzers

Page 35: Elasto Mania

.

.

.

analyzers.PUT /music/songs/_mapping..

.

{ ”song” : { ”properties” : {”title” : {

”type” : ”string”,”fields” : {

”title_exact” : { ”type” : ”string”,”index” : ”not_analyzed” },

”title_simple”: { ”type” : ”string”,”analyzer”: ”simple” },

”title_snow” : { ”type” : ”string”,”analyzer”: ”snowball” }

}},...

}}}

Page 36: Elasto Mania

.

.

.

highlighting.POST /publications/books/_search..

.

{”query” : {

”match” : { ”text” : ”spaceship” }},”fields” : [”title”, ”isbn”],”highlight” : {

”fields” : {”text” : { ”number_of_fragments” : 3 }

}}

}

Page 37: Elasto Mania

.

.

.

search phrases.POST /publications/books/_search..

.

{”query” : {

”match_phrase” : { ”text” : ”laser beam” }},”fields” : [”title”, ”isbn”],”highlight” : {

”fields” : {”text” : { ”number_of_fragments” : 3 }

}}

}

Page 38: Elasto Mania

.

.

.

going wild

Page 39: Elasto Mania

.

.

.

aggregations

Unit of work that builds analytic information over aset of documents.bucketing..

.Documents are evaluated and placed into bucketsaccording to previously defined criteria.metric..

.Keep track of metrics which are computed over aset of documents

Page 40: Elasto Mania

.

.

.

percolations

Page 41: Elasto Mania

.

.

.

more stuff

• routing• uri search• suggesters• count API• validate API• explain API• more like this API• …

Page 42: Elasto Mania

.

.

.

scalability

Page 43: Elasto Mania
Page 44: Elasto Mania

.

.

.

tools

Page 45: Elasto Mania

.

.

.

Logstash

Page 46: Elasto Mania

.

.

.

Kibana

Page 47: Elasto Mania

.

.

.

Marvel

Page 48: Elasto Mania

.

.

.

what aboutnow

Page 49: Elasto Mania

.

.

.

new features...2014..

.

Apr 3rd: countMar 6th: Tribe nodesJan 17th: the cat APIJan 29th: MarvelJan 21th: snapshot & restore.2013..

.

Sep 24th: official Elasticsearch clients for Ruby,Python, PHP and Perl

Nov 28th: Lucene 4.x doc values…:

Page 50: Elasto Mania

.

.

.

go read a book

• Exploring Elasticsearch, Andrew Cholakian• Elasticsearch – The Definitive Guide,

Clinton Gormley, Zachary Tong

Page 51: Elasto Mania

.

.

.

getting in touch

• https://github.com/elasticsearch• @elasticsearch• irc.freenode.org #elasticsearch• irc.perl.org #elasticsearch• http://www.elasticsearch.org/blog/• Elasticsearch User mailing list

Page 52: Elasto Mania

.

.

.

references

• Elastic SearchMegaManual• http://solr-vs-elasticsearch.com/• Elastic Search in Production• Exploring Elasticsearch, Andrew Cholakian• Elasticsearch – The Definitive Guide,

Clinton Gormley, Zachary Tong

Page 53: Elasto Mania

.

.

.

job’s done

questions?


Recommended