+ All Categories
Home > Technology > ElasticSearch for .NET Developers

ElasticSearch for .NET Developers

Date post: 16-Feb-2017
Category:
Upload: ben-van-mol
View: 696 times
Download: 4 times
Share this document with a friend
38
Ben van Mol ElasticSearch for .NET
Transcript
Page 1: ElasticSearch for .NET Developers

Ben van MolElasticSearch for .NET

Page 2: ElasticSearch for .NET Developers

SEARCH ENGINEWhy would I need one?

Page 3: ElasticSearch for .NET Developers

Search is more than text comparison

Page 4: ElasticSearch for .NET Developers

Search must advice

Page 5: ElasticSearch for .NET Developers

Search must be intelligent

Page 6: ElasticSearch for .NET Developers

Search must aggregate

Page 7: ElasticSearch for .NET Developers

What is ElasticSearch?

“flexible and powerful open-source, distributed (NoSQL), RESTful search engine build on top of Lucene”(http://www/elastic.co)

Features: real-time data, real-time analytics, distributed, high availability, multi-tenancy, full text search, document oriented, conflict management, schema free, restful API, per-operation persistence, apache 2 open source license, build on top of apache lucene.

Page 8: ElasticSearch for .NET Developers

Installation

Procedure

Java based, requires v7+ Same JVM version on all nodes is required Set a bunch of environment variables

Fill in the ElasticSearch config files

Streamlined Installation available for Windows (local service) https://github.com/rgl/elasticsearch-setup/releases

Page 9: ElasticSearch for .NET Developers

Scalability & performance

Page 10: ElasticSearch for .NET Developers

Scalability

NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address

- Structured & fixed data model vs. dynamic model

- Efficient, scale-out architecture instead of expensive, monolithic architecture (scale-up)

- Object-oriented programming that is easy to use and flexible

Data representation in JSON

Page 11: ElasticSearch for .NET Developers

Scalability - Architecture

Cluster logical grouping of multiple nodes

Node an elasticsearch server instance Master – in charge of managing cluster-wide operations

Only one, responsible for cluster-wide operations No bottleneck for queries

Shard low-level worker instance that holds a slice of all data Each document belongs to a single primary shard

Created during index creation Determines the number of data stored in each shard

Replica A copy of a master shard on a different node Can be created any time Spreading over nodes => done automatically

Page 12: ElasticSearch for .NET Developers

POST /<index name>{

"settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 }

}

Create an index1 node

2 nodes

3 nodes

3 nodes 2 replica’s

Having more replica’s shards on the same number of nodes doesn’t increase our performance at all

because each shard has access to a smaller fraction of its node’s resources

but it adds redundancy.

Page 13: ElasticSearch for .NET Developers

Default Routing

Hashes the ID of a document and uses that to find a shard (retrieve document). Gives an even distribution of documents across the entire set of shards

But what about search?

Incomming requestBroadcast & query all shards

Aggregate all results & send back

Page 14: ElasticSearch for .NET Developers

Custom Routing

Configure routing for a certain type:XPUT /<index name>/<type>/_mapping -d { "order":{ "_routing":{ "required":true, "path":"customerID" } } }

Search for a specific document of user user123:XGET /<index name>/<type>/_search?routing=user123 -d { "query":{ "match_all":{} } }

Tell ElasticSearch which property to use to determine routing E.g. zipcode, age,

Default routing ensures that distribution is fairly uniform across all shards.

Once you start implementing your own custom schemes, it is entirely possible that this

uniformity is lost.

Page 15: ElasticSearch for .NET Developers

Advanced Search Capabilities

Page 16: ElasticSearch for .NET Developers

Dealing with human language

Indexation

Example : <div>Here is some example text including an extract of 9 poems</div> Analyzers

Character filters convert 9 to nine strip HTML and extract the actual text lower-case all words

Tokenizer create individual terms or tokens from text, minding comma’s, whitespaces, periods, hyphens, …

Token filter: remove stopwords like ‘an’, ‘the’, … stemming: reduce verbes and words to their stem

{Here} {is} {some} {example} {text} {including} {extract} {nine} {poems}

Page 17: ElasticSearch for .NET Developers

Text Analysis - Experiments

Whitespace Whitespace tokenizer - A tokenizer of type whitespace that divides text at whitespace.

Sentence: Convert the title-case text using the ToLower(string) command.

Result: {Convert} {the} {title-case} {text} {using} {the} {ToLower(string)} {command.}

Page 18: ElasticSearch for .NET Developers

Text Analysis - Experiments

Simple Standard tokenizer - A tokenizer of type standard providing grammar based tokenizer

that is a good tokenizer for most European language documents. Lower-case token filter

Sentence: Convert the title-case text using the ToLower(string) command.

Result: {convert} {the} {title} {case} {text} {using} {the} {tolower} {string} {command}

Page 19: ElasticSearch for .NET Developers

Text Analysis - Experiments

Stop analyzer: Standard tokenizer Lower-case token filter Stop token filter

A token filter of type stop that removes stop words (meaningless words for search) from token streams.

Support for multiple languages

Sentence: Convert the title-case text using the ToLower(string) command.

Result: {convert} {the} {title} {case} {text} {using} {the} {tolower} {string} {command}

Page 20: ElasticSearch for .NET Developers

Text Analysis - Experiments

Snowball Standard tokenizer Lower-case token filter Stop token filter Stemming (snowball generated stemmer)

A filter that stems (reduce a word to the core) words using a Snowball-generated stemmer Support for multiple languages

Sentence: Convert the title-case text using the ToLower(string) command.

Result: {convert} {title} {case} {text} {usinge} {tolower} {string} {command}

Page 21: ElasticSearch for .NET Developers

Text Analysis- Adding Custom Analyzers

PUT /my-index/_settings

{ "index": { "analysis": { "analyzer": { “YourCustomAnalyzer": { "type": "custom", "char_filter": [ "html_strip" ], "tokenizer": "standard", “filter": [ "lowercase", "stop", "snowball" ] } } } }}

A list of available analysis tools: CharacterFilters: http://bit.ly/1H3hgJF Tokenizers: http://bit.ly/1zIU2IO Token filters: http://bit.ly/1AJXCO2

Possible to create your own combination!

Page 22: ElasticSearch for .NET Developers

Text Analysis – Define analyzer

Create a Mapping Type (cfr. Table) Assign fields Define field types (string, int, date,

…) Define the analyzer to be used Define the boost value on a field Define the routing …

PUT /my_index/_mapping/my_type{ "my_type": { "properties": { "english_title": { "type": "string", "analyzer": "english" } } }}

Page 23: ElasticSearch for .NET Developers

ELASTIC AND .NETLet’s get dirty!

Page 24: ElasticSearch for .NET Developers

What is NEST?

NEST

• All request & response objects represented• Strongly typed Query DSL implementation• Supports fluent syntax• Uses ElasticSearch.net

ElasticSearch.NET

• Low-level, dependency-free client• All ES endpoints are available as methods

ElasticSearch RESTFul API

http://nest.azurewebsites.net/

Page 25: ElasticSearch for .NET Developers

NEST – Connection Initialization

Initialize an ElasticClient:

All actions on the ElasticSearch cluster are performed using the ElasticClient

For example: Search Index DeleteIndex/CreateIndex …

Uri node = new Uri("http://192.168.137.73:9200");ConnectionSettings settings = new ConnectionSettings(node, defaultIndex: "products");ElasticClient client = new ElasticClient(settings);

Page 26: ElasticSearch for .NET Developers

Index your content

JSON .NET

PUT /products/product/1 Index the RAW JSON string Index a Type

Automatically infers Index Type ID

Use ElasticType to define type behavior Use ElasticProperty to define field behavior Define explicit values for inferred ones

More information: http://nest.azurewebsites.net/nest/index-type-inference.html

http://localhost:9200/products/product/1

{ "id":"1", "name" : "MacBook Air", "price" : 1099, "descr" : "Some lengthy never-read description", "attributes" : { "color" : "silver", "display" : 13.3, "ram" : 4 }}

Page 27: ElasticSearch for .NET Developers

Index your Content - .NET

Raw JSON string

Type based indexation

Modify out-of-the-box behavior using decorators

client.Raw.Index("products", "product", new JavaScriptSerializer().Serialize(prod));

client.Index(product);

[ElasticType(Name = "Product", IdProperty="id")] public class Product { public int id { get; set; } [ElasticProperty(Name = "name", Index = FieldIndexOption.Analyzed, Type = FieldType.String, Analyzer = "standard")] public string name { get; set; }

Page 28: ElasticSearch for .NET Developers

Query your content – JSON Query

JSON exampleshttp://localhost:9200/products/product/_search

Some queries will return nothing if lowercased by analyzer & split on whitespace!

{ "query" : { "term" : { "name": "MacBook Air" }}} { "query" : { "prefix" : { "name": "Mac" }}}{ "query" : { "range" : { "price" : { "from" : 1000, "to": 2000 } } } }{ "from": 0, "size": 10, "query" : { "term" : { "name": "MacBook Air" }}}{ "sort" : { "name" : { "order": "asc" } }, "query" : { "term" : { "name": "MacBook Air" }}}

Page 29: ElasticSearch for .NET Developers

Query your content – JSON Result{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.076713204, "hits": [ { "_index": "products", "_type": "Product", "_id": "1", "_score": 0.076713204, "_source": { "id": 1, "name": "MacBook Air", "price": 1099.0, "descr": "Some lengthy never-read description", "attributes": { "color": "silver", "display": 13.300000190734863, "ram": 4 } } },

Page 30: ElasticSearch for .NET Developers

Query your content – Query DSL .NET

Retrieve all products from an index using a MatchAll search

Retrieve all products by using a term query

Search on all fields using the _all built-in property

Search on a combination of fields using boolean operators (see fiddler result)

result = client.Search<Product>(s => s.MatchAll());

result = client.Search<Product>(s => s.Query(q => q.Term(t => t.name, "macbook")));result = client.Search<Product>(s => s.Query(q => q.Term("name", "macbook")));

result = client.Search<Product>(s => s.Query(q => q.Term("_all", "macbook")));

result = client.Search<Product>(s => s.Query(q => q.Term("name", "macbook") || q.Term("descr","macbook")));

Page 31: ElasticSearch for .NET Developers

Query your content – Query DSL

Search on a combination of fields using boolean operators and a date range filter

Some more advanced query examples: Wildcard Query - use wildcards to search for relevant documents Span Near - search for word combinations within a certain span in the document More like this query - finds documents which are ‘like’ a given set of documents using

representative terms More information: http://bit.ly/1A6wpKs

result = client.Search<Product>(s => s .Query(q => (q.Term("name", "macbook") || q.Term("descr", "macbook")) && q.Range(r => r .OnField("price") .Greater(1000) .LowerOrEquals(2000) )));

Page 32: ElasticSearch for .NET Developers

Query your content – Fuzzy searches

Perform a fuzzy search to overcome query string errors result = client.Search<Product>(s => s .Query(q => q .Match(m => m .Query("makboek") .OnField("name") .Fuzziness(10) .PrefixLength(1) )));

Page 33: ElasticSearch for .NET Developers

Query your content - Paging

Select pages from the full result set using the From & Size filters

result = client.Search<Product>(s => s .Query(q => q.Term("name", "macbook") || q.Term("descr", "macbook")) .From(0) .Size(1));

Page 34: ElasticSearch for .NET Developers

Query your content – Hit Highlighting.NET Code JSON Result

Hit Highlighting

Possible to add other Pre- and Post-tags on specific fields

result = client.Search<Product>(s => s .Query(q => q.Term("name", "macbook")) .Highlight(h => h .PreTags("<b>") .PostTags("</b>") .OnFields(f => f .OnField(e => e.name))));

Page 35: ElasticSearch for .NET Developers

Query your content – Aggregations

.NET Code JSON Result

Aggregations group documents based on term values

Useful to create a facetted search interface

result = client.Search<Product>(s => s .Aggregations(a => a .Terms("color", st => st .Field(o => o.attributes.color))));

Page 36: ElasticSearch for .NET Developers

Query your content – Suggesters

Did you mean Term suggester

Suggests terms based on edit distance (=number of operations needed to switch term)

More info: http://bit.ly/1FDFPwr

Phrase suggester adds additional logic on top of the term suggester to select entire corrected

phrases instead of individual tokens weighted based on ngram-language models.

Provides better suggestions because of co-occurrence & frequency More info: http://bit.ly/1FbfAKg

Page 37: ElasticSearch for .NET Developers

Query your content – Suggesters

Search as you type Completion suggester

a so-called prefix suggester does not do spell correction like the term or phrase suggesters but allows basic auto-complete

functionality Uses FST models and makes them part of the index for faster querying More info: http://bit.ly/1HwFKbO

hotel, marriot, mercure, munchen and munich 

Page 38: ElasticSearch for .NET Developers

QUESTIONS?


Recommended