Date post: | 11-Aug-2014 |
Category: |
Data & Analytics |
Upload: | sematext-group-inc |
View: | 963 times |
Download: | 3 times |
(Elastic)search in Big Data
Radu Gheorghe
@radu0gheorghe @sematext
What is “search in Big Data”? Challenges?
Some solutions?
How does Elasticsearch do it?
Agenda
Search Expectations
headphones for iPhone 4, iPhone 5, iPhone 6 and iPhone 7iPhone 5iPhone 4
Relevancy...
iphone
iphone iphone 5Institute of Public Health
...and autocomplete...
iph
No results found for “iphnoe”iPhone 5iPhone 4
… and fuzziness...
iphnoe
Did you mean “iPhone”?iPhone 5iPhone 4
...and corrections...
iphnoe
shows resultsanyway
iPhone 5iPhone 4iPhone 3Galaxy S4
...and similar terms...
iphone
iPhone 5iPhone 4
...and don’t forget the statistics!
iphone☑ iOS☐ other
☑ <100RON☐ 100-200RON☐ >200RON
Wait. Fancy search == Big Data?
Fancy stuff isn’t free
iphone☑ iOS☐ other
☑ <100RON☐ 100-200RON☐ >200RON
N requests forautocomplete
Did you mean...
iPhone 5iPhone 4iPhone 3Galaxy S4
1 request foreach of the stats
1 request for synonyms, 1 for exact matches, etc
1 request for corrections
Distributed search. When one server doesn’t cut it
Log Search
web_server01
database01
backend01
search engine
10:01 - webapp - DB connect error10:00 - DB - I/O error
error
Log Analytics
unique IPs: 7584
iPhone 5iPhone 4Galaxy S4
best sellers
Romania: 200France: 150Hungary: 120
users per country
revenue per day
Distributed search solutions
Elasticsearch
Solr
Others: SenseiDB, Sphinx…
SaaS: CloudSearch, Logsene...
built on top of Lucene
Document-oriented
Lucene awesome: index & store data, relevancy, fuzzy, suggesters...
...all wrapped up in JSON over HTTP
Elasticsearch
Aggregations
revenue per dayunique IPs: 7584
Aggregations
revenue per dayunique IPs: 7584
Romania: 200France: 150Hungary: 120
unique IPs per country
Aggregations
revenue per day
Romania: 200France: 150Hungary: 120
unique IPs per country
unique IPs per country per day
Romania
unique IPs: 7584
Node 1
Node 1
Node 1 Node 2
Node 1 Node 2
Node 1 Node 2 Node 3
Node 1 Node 2 Node 3
Node 1 Node 2 Node 3
Node 1 Node 2
Node 1 Node 2
Big Data distributedsearch
search and real-time analytics
Big Data distributedsearch
search and real-time analytics
more search features
Big Data distributedsearch
search and real-time analytics
more search features
clients
usage(logs)
Thank you!
[email protected]@radu0gheorghe @sematext
Big Data distributedsearch
search and real-time analytics
more search features
clients
usage(logs)