Finding the right stuff, an intro to Elasticsearch (at Rug::B)

Post on 14-Feb-2017

634 views 2 download

transcript

Finding the right stuff

Michael Reinsch

an intro to Elasticsearch with Ruby/Rails

at Ruby User Group Berlin, Feb 2016

How does it fit into my app?

Blackbox with REST API

elasticsearch

Update API: your app pushes updates (updates are fast, but asynchronous)

Search API: returns search results

For Ruby / Rails

• https://github.com/elastic/elasticsearch-rails

• gems for Rails:

• elasticsearch-model & elasticsearch-rails

• without Rails / AR:

• elasticsearch-persistence

class Event < ActiveRecord::Base include Elasticsearch::Model

class Event < ActiveRecord::Base include Elasticsearch::Model

def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end

class Event < ActiveRecord::Base include Elasticsearch::Model

def as_indexed_json(options={}) { title: title, description: description, starts_at: starts_at.iso8601, featured: group.featured? } end

settings do mapping dynamic: 'false' do indexes :title, type: 'string' indexes :description, type: 'string' indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end

Event.import

Elasticsearch cluster

Index: events

Type: event

doc 1

Elasticsearch cluster

Index: creations

Type: creation

doc 1

Type: activity

doc 2 doc 1

Index: events

Type: event

doc 1

Elasticsearch cluster

Documents, not relationships

compose documents with all relevant data

➜ "denormalize" your data

class Event < ActiveRecord::Base include Elasticsearch::Model

def as_indexed_json(options={}) { titles: [ title1, title2 ], locations: locs.map(&:as_indexed_json)

} end

settings do mapping dynamic: 'false' do indexes :titles, type: 'string' indexes :locations, type: 'nested' do indexes :name, type: 'string' indexes :address, type: 'string' indexes :location, type: 'geo_point' end end end

Event.search 'tokyo rubyist'

response = Event.search 'tokyo rubyist'

response.took # => 28

response.results.total # => 2075

response.results.first._score # => 0.921177

response.results.first._source.title # => "Drop in Ruby"

response.page(2).results # => second page of results

response = Event.search 'tokyo rubyist'

response.took # => 28

response.results.total # => 2075

response.results.first._score # => 0.921177

response.results.first._source.title # => "Drop in Ruby"

response.page(2).results # => second page of results supports kaminari /

will_paginate

response = Event.search 'tokyo rubyist'

response.records.to_a # => [#<Event id: 12409, ...>, ...]

response.page(2).records # => second page of result records

response.records.each_with_hit do |rec,hit| puts "* #{rec.title}: #{hit._score}" end # * Drop in Ruby: 0.9205564 # * Javascript meets Ruby in Kamakura: 0.8947 # * Meetup at EC Navi: 0.8766844 # * Pair Programming Session #3: 0.8603562 # * Kickoff Party: 0.8265461

Event.search 'tokyo rubyist'

Event.search 'tokyo rubyist'

only upcoming events?

Event.search 'tokyo rubyist'

only upcoming events?

sorted by start date?

Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }

Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }

our query

Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }

filtered by conditions

our query

Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: 'asc' } }

filtered by conditions

sorted by start time

our query

Query DSL

query: { <query_type>: <arguments> }filter: { <filter_type>: <arguments> }

valid arguments depend on query / filter type

Query DSL

query: { <query_type>: <arguments> }filter: { <filter_type>: <arguments> }

valid arguments depend on query / filter type

scores matching documents

Query DSL

query: { <query_type>: <arguments> }filter: { <filter_type>: <arguments> }

valid arguments depend on query / filter type

scores matching documents

filters documents

Event.search query: { filtered: { query: { simple_query_string: { query: 'tokyo rubyist', default_operator: 'and' } }, filter: { and: [ { range: { starts_at: { gte: 'now' } } }, { term: { featured: true } } ] } } }, sort: { starts_at: { order: "asc" } }

Match QueryMulti Match Query

Bool Query Boosting Query

Common Terms Query Constant Score Query

Dis Max Query Filtered Query

Fuzzy Like This Query Fuzzy Like This Field Query

Function Score QueryFuzzy Query

GeoShape Query Has Child Query

Has Parent Query Ids Query

Indices Query Match All Query

More Like This Query

Nested Query Prefix Query

Query String Query Simple Query String Query

Range Query Regexp Query

Span First Query Span Multi Term Query

Span Near Query Span Not Query Span Or Query

Span Term Query Term Query Terms Query

Top Children Query Wildcard Query

Minimum Should Match Multi Term Query Rewrite

Template Query

And FilterBool Filter

Exists Filter Geo Bounding Box Filter

Geo Distance Filter Geo Distance Range Filter

Geo Polygon Filter GeoShape Filter

Geohash Cell Filter Has Child Filter

Has Parent Filter Ids Filter

Indices Filter

Limit Filter Match All Filter Missing Filter Nested Filter

Not FilterOr Filter

Prefix Filter Query Filter

Range FilterRegexp Filter Script Filter Term Filter

Terms FilterType Filter

Event.search query: { bool: { should: [ { simple_query_string: { query: "tokyo rubyist", default_operator: "and" } }, { function_score: { filter: { and: [ { range: { starts_at: { lte: 'now' } } }, { term: { featured: true } } ] }, gauss: { starts_at: { origin: 'now', scale: '10d', decay: 0.5 }, }, boost_mode: "sum" } } ], minimum_should_match: 2 } }

Create service objectsclass EventSearch

def initialize @filters = [] end

def starting_after(time) tap { @filters << { range: { starts_at: { gte: time } } } } end

def featured tap { @filters << { term: { featured: true } } } end

def in_group(group_id) tap { @filters << { term: { group_id: group_id } } } end

Event.search '東京rubyist'

Dealing with different languages

built in analysers for arabic, armenian, basque, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai.

class Event < ActiveRecord::Base include Elasticsearch::Model

def as_indexed_json(options={}) { title: { en: title_en, de: title_de, ja: title_ja }, description: { en: desc_en, de: desc_de, ja: desc_ja }, starts_at: starts_at.iso8601, featured: group.featured? } end

settings do mapping dynamic: 'false' do indexes :title do indexes :en, type: 'string', analyzer: 'english' indexes :de, type: 'string', analyzer: 'german' indexes :ja, type: 'string', analyzer: 'cjk' end indexes :description do indexes :en, type: 'string', analyzer: 'english' indexes :de, type: 'string', analyzer: 'german' indexes :ja, type: 'string', analyzer: 'cjk' end indexes :starts_at, type: 'date' indexes :featured, type: 'boolean' end end

Changes to mappings?

⚠ can't change field types / analysers ⚠

but: we can add new field mappings

class AddCreatedAtToES < ActiveRecord::Migration def up client = Elasticsearch::Client.new client.indices.put_mapping( index: Event.index_name, type: Event.document_type, body: { properties: { created_at: { type: 'date' } } } ) Event.__elasticsearch__.import end

def down end end

Automated tests

class Event < ActiveRecord::Base include Elasticsearch::Model

index_name "drkpr_#{Rails.env}_events"

Index names with environment

Test helpers

• everything is asynchronous!

• Helpers:wait_for_elasticsearchwait_for_elasticsearch_removalclear_elasticsearch!➜ https://gist.github.com/mreinsch/094dc9cf63362314cef4

• specs: Tag tests which require elasticsearch

Production ready?

• use elastic.co/found or AWS ES

• use two clustered instances for redundancy

• Elasticsearch could go away

• keep impact at a minimum!

• update Elasticsearch from background worker

Questions?

Resources:

Elastic Docs https://www.elastic.co/guide/index.html

Ruby Gem Docs https://github.com/elastic/elasticsearch-rails

Elasticsearch rspec helpershttps://gist.github.com/mreinsch/094dc9cf63362314cef4 Elasticsearch indexer job examplehttps://gist.github.com/mreinsch/acb2f6c58891e5cd4f13

or ask me later:

michael@movingfast.io @mreinsch