Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mul@tenant Search JDD 2014, Krakow -‐ PL
Pablo Barros Applica@ons Architect October 14, 2014
2
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
• The opinions and views expressed in this talk are my own, and do not necessarily reflect the opinions or views of my employer.
3
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
About me
4
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
About me
5
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Giveaway
• Elas@csearch Server – Second Edi@on – By Rafal Kuc
6
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
Key Concepts and PiUalls of Mul@tenancy
Designing the Search Index
Defining the Cluster Topology
Integra@ng with your Applica@on
Q&A
7
1
2
3
4
5
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Overview Key Concepts and Pi>alls of MulCtenant Search
8
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Defining Mul@tenancy
“Single so^ware instance serving mul@ple customers.”
9
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Benefits
• Sharing of Resources • Lower Costs • Easier Horizontal Scaling • Quicker onboarding of new Customers • Data Aggrega@on • Simpler Release Processes • “Green”
10
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
PiUalls & Risks
• Resource Sharing Limits • Requires more Customiza@on capabili@es • Higher Complexity • Data Security
11
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Search Engine
Topology
12
Your Applica@on
Search Engine Search Cluster 1..N
Read/Write
Read
Hub/Tribe Node
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Designing the Search Index
13
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Index Logical Granularity
14
vs.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Shared Indices
• Schema-‐less Index • Pros – True Global search – Intermixed Results across Customers/En@@es
• Cons – Cross Tenant Data Security – Weaker data separa@on – Index corrup@on can affect en@re Search – Ability of indexing data in parallel diminished
15
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Dedicated Indices
• Pros – Bejer data separa@on – More modular/portable – Bejer parallel indexing capabili@es
• Cons – More storage – Global search is more limited • However, some search engines allow searching across indexes and even across clusters (Elas@csearch Tribe Node)
16
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Indexing Process & Storage
17
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Indexing Process & Storage
18
Token Pointer
Droid 1, 2, 3
Look 2, 3
Rain 1
Doc 1: …
Doc 2: …
Doc 3: …
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Indexing with Storage Enabled
“total”: 2,!“hits”: [!{!!“id”: 1,!!“text”: “These are not the <b>droids</b> you are looking for.”,!!“date”: “2014/10/8”!
},!{!!“id”: 2,!!“text”: “However, those are the <b>droids</b> you are looking for.”,!!“date”: “2014/10/8”!
} ]!
19
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Indexing with Storage Disabled
!“total”: 2,!“hits”: [!{!!“id”: 1,!
},!{!!“id”: 2,!
} ]!
20
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Storing and Retrieving Original Indexed Data
21
Full Document IDs Only Pros • Avoid hiong database on
your applica@on • Snippet highligh@ng
• Storage on Search Engine file system is light
• Small response payload Cons • Extra storage on Search
Engine file system • Access control needs to be built in the index
• Reliance on database for reading data to show users
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Parent-‐child Rela@onships
• Defines 1-‐to-‐many rela@onship between entries in different indices • Convenient when pushing rela@onal data into Index • Parent can be updated without re-‐indexing children
22
Customer Order 1 0..*
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Mul@ple Languages
• Leverage language auto-‐detec@on • Leverage stop words • Limit amount of stemming • Op@on: – Single entry in mul@ple languages • Merge value in different languages into single field • Pro: Simple implementa@on. Search can be performed in any language • Con: Match might include homonym in other languages
23
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Defining the Search Cluster Topology
24
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Node 2
Shards
25
Node 1
1 2 3
4 5
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Node 2
Replicas
26
Node 1
1 2 3
4 5 1R 2R 3R 4R 5R
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Cluster
• Approach depends on what your framework has to offer • Elas@csearch provides a lot of support out of the box • Considera@ons: – Cluster Segmenta@on (Few Smaller vs Single Large?) – Geographical Distribu@on – Searching across Clusters – Write/Read Ra@o
27
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Hub
• Aware of All Clusters • Maintains map of Tenant -‐> Cluster • Serves as discovery mechanism for the Client Applica@on • Able to create/pause/move/delete Tenants
28
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Hub Tenant Discovery Service
29
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Integra@ng with your Applica@on
30
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Indexing vs Querying
• Expected load on wri@ng/reads • Depends on Problem Domain of Client Applica@on • Writes are expensive! – Specially if not done in bulk
• Reads are fairly cheap
31
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Ini@al Data Load/Full Re-‐index
• Perform Ac@ons in Bulk – Minimize overall number of Lucene Commits
• Consider enabling External “Versioning” – Safely parallelize indexing requests
• Keep track of documents that failed to index
32
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Incremental Indexing
33
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Incremental Indexing
• Monitor Indexing requests delay • Message customers accordingly – i.e.
Search Results might not include recently updated entries.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Disaster Recovery
• Take advantage of what your Framework offers you – i.e. Replica@on in Elas@c Search
• Nightly Backups + Replay of changes since Backup crea@on • Avoid star@ng from scratch!
35
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Final Thoughts
• Recent Open-‐source Tooling (*cough* Elas%csearch) makes it easy • Consider Carefully: – Design and granularity of your tenant in the Search engine – Define En@@es and their Rela@onships – Sharding and Replica@on Schemes – Clustering Distribu@on • i.e. per applica@on Installa@on, geographically, etc.
– High Availability Mechanisms
36
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Q&A Thank you!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 38