Elasticsearch 5.0

transcript

Elastic 5.0…so much awesomeness!Matias Cascallares, Solutions Architectmatias@elastic.co

• Made in Argentina, living in Singapore

• Java / Python / NodeJS

• Working with/in open source for the last 8 years

• Using Elasticsearch since 2014, working for Elastic since 2015

• Meme lover

> whoami

The Elastic Stack

It’s Complicated

Store, Index & Analyze

• Resilient; designed for scale-out

• High availability; multitenancy

• Structured & unstructured data

Distributed& Scalable

DeveloperFriendly

Search & Analytics

• Schemaless

• Native JSON

• Client libraries

• Apache Lucene

• Real-time

• Full-text search

• Aggregations

• Geospatial

• Multilingual

• Lower memory usage & improved cluster stability(new keyword type)

• Better scoring, faster, reduced hardware demand(Okapi BM25)

• IPv6 type support

Keyword

String

Update To Lucene 6

• Half the disk space

• Twice as fast to ingest

• 25% faster to search

• For numeric and geospatial fields only

• Scaled floats

• Technically a BKD Tree implementation

Lucene Demensional Fields

Some Benchmarks

New Scripting Language: Painless

• Aggregation and suggestion results are cached on shard level for instant returns after the first query.

• Combined with a new query rewrite, typical Kibana dashboards that use “last X days” type of queries will improve dramatically.

Shard Request Cache

Rollover API

• Indices not based on time, but on size of the data.

• Even if your data sizes are not consistent per day, Elasticsearch will use constant index/shard sizes.

• Set up rules around automatic rollover to a new index, with aliases.

Shrink API

• Reduce resources on immutable data

• Easily reduce the number of shards to free up resources

• Indices can be shrunk to a factor of its original number of shards

• Low-level client

• Allows communication through HTTP/S

• Sync and Async semantics

• Connection handling

• Node discovery (sniffer module)

Java REST Client

• Define processing pipelines right in the Elasticsearch cluster.

• Depending on use case, can simplify the architecture

• Has Processors for the most common actions.

• Combine it with Logstash when needed for power & flexibility.

Ingest Node

Bootstrap Checks

• Detects if it’s running in production or development mode

• When running in production, it will now refuse to start under certain conditions that could seriously impact performance, stability, or data integrity

‒ Heap size (initial vs max)

‒ Memory lock (mlockall)

‒ Virtual memory size

‒ File descriptors

‒ Threads

‒ JVM in server mode

More Goodies…

• Dots in field names was supported in 1.x, and was removed in 2.x. 5.0 support dots in field names again!

More Goodies…

• New lock method increases small document indexing up to 15-20%

• New fsync method for increased ingestion speed

• refresh=[true|wait_for] for index, update, delete and bulk APIs

• Migration Helper

‒ Cluster checkup before upgrading

‒ Reindex helper for 1.x indices

‒ Deprecation logging

Version Compatibility

IDX_v1x IDX_v2x IDX_v5x

ES 1.X

ES 2.X

ES 5.X

Website: www.elastic.co Products: https://www.elastic.co/products Forums: https://discuss.elastic.co/ Community: https://www.elastic.co/community/meetupsTwitter: @elastic

Thank You.

Elasticsearch 5.0

Technology