Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | sematext-group-inc |
View: | 114 times |
Download: | 3 times |
Round 2
Battle of the Giants
Rafał Kuć – Sematext Group, Inc.@kucrafal @sematext sematext.com
VS
Ich bin ein…
Sematext consultant & engineerSolr Cookbook series author„ElasticSearch Server” author„Mastering ElasticSearch” authorSolr.pl co-founderFather and husband
Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved
VS
Under the Hood
Copyright 2013 Sematext Group. Inc. All rights reserved
Lucene 4.3Lucene 4.3
ExpectationsScalabilityFault tolerananceHigh availablityFeaturesManageabilityEase of installationTools Support
Copyright 2013 Sematext Group. Inc. All rights reserved
Expectations vs Reality
Only ElasticSearch nodesSingle leader
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr + ZooKeeperLeader per shard
DistributedFault tolerant
Automatic leader election
All Time Top Committers
Copyright 2013 Sematext Group. Inc. All rights reserved
Active Contributors
Copyright 2013 Sematext Group. Inc. All rights reserved
The Code
Copyright 2013 Sematext Group. Inc. All rights reserved
The Mailing Lists
Copyright 2013 Sematext Group. Inc. All rights reserved
Trends
Copyright 2013 Sematext Group. Inc. All rights reserved
Collection vs Index
Collections and Indices can be spread among different nodes in the cluster
Copyright 2013 Sematext Group. Inc. All rights reserved
Collection – main logical index
Index – main logical structure
Apache Solr Index Structure
Field and types defined in schemaAutomatic value copyingDynamic fieldsCustom similarityCustom postings formatMultiple document types require shared schemaCan be read using API
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Index Structure
Schema - lessFields and types defined with HTTP APIMulti – field supportNested and parent – child documentsCustom similarity Custom postings format Multiple document with different structureCan be read and written using API
Copyright 2013 Sematext Group. Inc. All rights reserved
Shards and Replicas
Many shards0 or more replicasReplica can become leader Replicas can be created on live cluster
Copyright 2013 Sematext Group. Inc. All rights reserved
Configuration
Static in solrconfig.xmlCan be reloaded with
core reload
Static in elasticsearch.yml
Changable at runtime
Copyright 2013 Sematext Group. Inc. All rights reserved
Discovery
Copyright 2013 Sematext Group. Inc. All rights reserved
Zen DiscoveryApache Zookeeper
Solr & ZooKeeper
Requires additional softwarePrevents split – brain situationsHolds collections configurationsZooKeeper ensemble needed
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Zen Discovery
Automatic node discoveryMulticast and unicast discovery methodsAutomatic master detectionTwo - way failure detection
Copyright 2013 Sematext Group. Inc. All rights reserved
HTTP FTW
HTTP REST API in ElasticSearch or Query String for simple queriesHTTP with Query String in Apache SolrBoth provide specialized Java API
Copyright 2013 Sematext Group. Inc. All rights reserved
Results Grouping
Group on: field value query result function query
Copyright 2013 Sematext Group. Inc. All rights reserved
Prospective Search
Called PercolatorMatches documents to stored queries
Copyright 2013 Sematext Group. Inc. All rights reserved
Full Text Search Capabilities
Variety of queriesControl score calculationDifferent query parsers Advanced Lucene queries
Copyright 2013 Sematext Group. Inc. All rights reserved
Score Calculation
Leverage Lucene scoring Control importance of: documents queries terms phrasesSimiliarity configuration
Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr and Score Influence
Index - time boostingQuery - time
Term boostsField boostsPhrases boostFunction queriesSub-queries used for boosting
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch and Score Influence
Index - timeQuery - time
Different queries provide different boost controlsCan calculate distributed term frequenciesNegative and Positive boosting queriesCustom score filters
Scripts
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Query Rescore
Reorders top N hits by using other queryExecuted on shards before results are returned to the node handling itNot executed with scan and count
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Nested Objects
Indexed as separate documentsStored in the same part of index as root docHidden from standard queries and filtersNeed appropriate queries and filters (nested)Top level documents can be sorted on the basis of nested ones
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr Parent – Child Relationship
Used at query timeMulti core joins possible
select?q={!join from=parent to=id}color:Yellow
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Parent – Child
Proper indexing requiredIndexed as separate documentsStandard queries don’t return child documentsRetrieve parent docs using queries and filters (has_child, has_parent, top_children)
Copyright 2013 Sematext Group. Inc. All rights reserved
FiltersUsed to narrown down query results
Good candidates for caching and reuse
Copyright 2013 Sematext Group. Inc. All rights reserved
AddictiveCan use different query parsersCan use local paramsNarrows down faceting results
Defined using Query DSLCan be used for score calculation Doesn’t narrow down faceting results
Faceting
Copyright 2013 Sematext Group. Inc. All rights reserved
TermsRange & queryTerms statisticsSpatial distance
Pivot Histograms
Real Time Or Not ?
Get not yet indexed docs from transaction logDon’t need searcher reopening
Copyright 2013 Sematext Group. Inc. All rights reserved
Separate Get and Multi Get API
Separate Realtime Get Handler
Data Handling
Single and batch indexing supported
Copyright 2013 Sematext Group. Inc. All rights reserved
JSON in / JSON out(and YAML)
Different formats allowed (XML, JSON, CSV, binary)
Partial Document Updates
Not based on LUCENE-3837Server-side doc reindexingBoth servers use versioning Decreases network traffic
Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr Partial Doc Update
Sent to the standard update handlerRequires _version_ field
curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]'
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Partial Doc Update
Special end – point exposed - _updateSupports parameters like routing, parent, replication, percolate, etc (similar to Index API)Uses scripts to perform document updates
curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true }}'
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr Collections API
Collection creation reload deletion shards splitting
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Indices REST API
Index creation deletion closing and opening refreshing existence checking
Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr Shard Splitting
Copyright 2013 Sematext Group. Inc. All rights reserved
admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
Cluster State Monitoring
Copyright 2013 Sematext Group. Inc. All rights reserved
Multiple MBeans exposed by JMX
Multiple REST end – points exposed to get different statistics
ElasticSearch Statistics API
Health and state checkNodes informationCache statisticsSegments informationIndex informationMappings information
Copyright 2013 Sematext Group. Inc. All rights reserved
SPM – „One to rule them all”
ElasticSearch Cluster Settings Update
Control rebalancing recovery allocationChange cluster configuration properties
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Custom Shard Allocation
Cluster level:
Index level:
curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" }}'
curl -XPUT localhost:9200/sematext/_settings/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo"}'
Copyright 2013 Sematext Group. Inc. All rights reserved
Moving Shards and Replicas
Move shards between nodes on demand
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }'
Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved
The Verdict
And The Winner Is ?
The Users
Copyright 2013 Sematext Group. Inc. All rights reserved
We Are Hiring !
Dig Search ?Dig Analytics ?Dig Big Data ?Dig Performance ?Dig working with and in open – source ?We’re hiring world – wide !
http://sematext.com/about/jobs.html
Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved
Rafał Kuć @kucrafal [email protected]
Sematext @sematext http://sematext.com http://blog.sematext.com
ElasticSearch Server 25% off:MREESS25
Thank You !