Date post: | 13-Jan-2017 |
Category: |
Data & Analytics |
Upload: | ricardo-peres |
View: | 143 times |
Download: | 1 times |
Introduction
NoSQL database for indexing JSON contentsDocuments are indexed as they are added (< 1s)Schema-less (kind of…)DistributedHigh performanceREST semanticsGraph capabilities Based on LucenePart of the ELK stackOpen source!
Cluster
A collection of servers (nodes) running Elasticsearch
Single masterMulticast based discovery (can be explicit)
Shards
Indexes are distributed by shards – default is 5 shards and 1 replica (cluster)
Defined at index creation timeTransparent to the userIt is possible to define a hashing function
Documents
Self-contained dataExist in a typeHave an idHave a versionHave a schemaCan have expiration
Fields
Documents are structured in fieldsSpecial fields: _id, _uid, _index, _type_timestamp, _all, _source, _ttl, _meta, _parent, _routing are optional
Data Types
string long, integer, short, byte,
double, floatdatebooleanbinary geo_point geo_shapeobjectnested
ip completion token_count
arrays
Creating a Document
Auto IdPOST /website/blog{
"title" : “My Blog", "url" :
"http://my/blog", "tags" :
[ "development" ]}
Explicit IdPOST /website/blog/1{
"title" : "My Blog", "url" :
"http://my/blog", "tags" : [ "
development " ]}
Updating a Document
PartialPOST /website/blog/1/_update{
"doc" :{
"tags" : [ "testing" ],
"views": 1}
}
FullPOST /website/blog/1/_update{
"title" : "My Blog", "url" :
"http://my/blog", "tags" : [ "testing" ]
}
MappingsCreated at index or type level implicitly or explicitlyCannot modify, only addCan enforce schema or notPUT website
{
"mappings": {
"blog": {
" _timestamp”: {
"enabled" : true
},
"dynamic" : "strict",
"properties": {
"title": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
Mapping Templates
Automatically apply mappings to new typesPUT website{ "mappings": { "post": { "dynamic_templates": [ { "timestamp": {
"date_detection": true, "dynamic_date_formats": [ "yyyy-MM-dd HH:mm", "yyyy-MM-dd" ], "match": "timestamp", "match_mapping_type": "date", "mapping": { "type": "date", "format" : "yyyy-MM-dd HH:mm" }
} } ] } }}
Query and Filter Context
Queries: scoring of the results
Filters determine what appears in the resultsAre cached
Querying
Search API Uses the URLStarting with <index> and <type> is optional/<index>/<type>/_search?q=something/<index>/<type1>,<type2>/_search?q=something_search?q=something_search?q:field:value_search?q=+firstname(john mary)&-surname:smith
Query DSL Query and filter context simple_query_string,
query_string, match, term, terms, range, multi_match, match_phrase, missing, exists, regexp, fuzzy, prefix, ids
bool, dis_max more_like_this, script,
template
Pagination, Sorting and Projection
size, fromsortfieldsPOST website/post/_search{
“size”: 10,“from”: 0,“sort”: {
“timestamp”: {“order”: “desc”
}},“fields”: [ “title”, “_id” ]
}
Percolator
Search in reverse: first define the query, then add documents to it
Querying a document gives all percolator queries that it matches
Relations
No joins, but some alternatives
Parent/child: has_child, has_parent
Nested objects
Terms filter lookup: terms with type and id
Relevance
Term Frequency/Inverse Document Frequency/Field Length NormCustom scoresA match hit/miss can be explained
Index Aliases
Used to refer to one or more indexes, one or more types, possibly with a filterUseful for "moving indexes" (month, year, country, etc)
POST /_aliases{
"actions" : [ {"add" : {
"indices" : [ "social-2015", "social-2016" ],"alias" : "social-testing","filter" : {
"term" : {"tag" : "testing"
} }
} } ]
}
Alias Templates Creates an alias when a type is created
POST /_template/social{ "order": 0, "template": "social-*", "settings": { "index": { "refresh_interval": "5s" } }, "mappings": {}, "aliases": { "social": {} }}
Bulk Operations
Perform multiple operations (index, update, delete) at once
POST bulk/data/_bulk{ "index" : { "_id" : "1" } }{ "field1" : "value1" }{ "index" : { "_id" : "2" } }{ "field1" : "value1" }{ "index" : { "_id" : "3" } }{ "field1" : "value1" }{ "update" : { "_id" : "2" } }{ "doc": { "field2": "value2" } }{ "delete" : { "_id" : "3" } }
Analytics Aggregations Can be nested Can use scripts
GET /megacorp/employee/_search{ "aggs": { "all_interests": { "terms": { "field": “feature“ }, “aggs”: { “average_price”: { “field”: “price” } } } }}
Logstash Collect and transform data Input – Filters – Outputs Sources/destinations:
Elasticsearch File Syslog Windows Eventlog Redis RabbitMQ GitHub HTTP Beats Twitter WebSocket …
Referenceshttps://www.elastic.cohttps://
www.gitbook.com/book/allen8807/elasticsearch-definitive-guide-en/details
https://github.com/elastic/cookbook-elasticsearchhttps://github.com/elastic/elasticsearch-nethttps://github.com/elastic/kibanahttps://github.com/elastic/logstashhttps://github.com/elastic/elasticsearchhttp://joelabrahamsson.com/elasticsearch-101