Date post: | 07-Jan-2017 |
Category: |
Technology |
Upload: | lucidworks |
View: | 896 times |
Download: | 0 times |
Ubiquitous Solr - A Database’s not-so-evil Twin Ayon Sinha
Data Foundation @WalmartLabs
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
2
Text Search
wow
Search Suggestions
Search Engine… Lucene… Solr
• Internet and Intranet Search
• Relevance
• Search Suggestions
• Faceting
• Recommendations
• Time series
• Log search
• Geo-spatial search
• Analytics
• Graph search
• Document Store
Recommendations
Relevance Facets
3
Overview
• How to scale any data infrastructure with Apache Solr
• Build a high performance and highly available data platform for internal and external users alike
• Walmart’s commitment to open source
4
About me
• Team lead at the Data Foundation team for the largest retailer and the largest private employer in the world
• Prior to Walmart, worked at startups building recommendation and analytics systems
• And prior to that, was building search applications, recommendations systems and Hadoop based analytics systems for the largest online auction company, ebay, for 6 years
• Have been a manuscript reviewer for Manning publications for 4 years and have helped shape the contents of “Hadoop in Practice” and “Big Data”
5
About Walmart
• 11,000+ Stores in 27 countries
• 11 eCommerce sites
• 250M customers weekly in stores and online
• Millions of database transactions per day • Sales, Holidays and massive volume shifts
8
Users REALLY like this..
Higher volume, increased use cases. Quick fix scaling alternatives add some headroom … and complexity
9
We need more Business Intelligence
Business is looking good but source-of-truth data store, not so much …
11
Design to scale out
• Offload queries to Search Engines
• Offload recurring reads to Cache
• Offload analytics to OLAP datastores
• Shard the database
… and of course do something to hide the complexity. It is
worth it.
13
The “not-so-evil” Twin to protect your Source of Truth DB
• What if a copy of your source-of-truth data is available … Just about anywhere you want it?
• How could you use a search engine to protect and augment your database? – Redirect queries
• Helps scale by reducing demand for – database indexing – database connections – scarce database resources like memory, storage
• Not-so-evil Twin – Adding multiple near real-time search adds complexity … and it
comes at a cost; but done right, the benefits far outweigh the costs
14
Our Approach
• Abstract the complexity of managing – source-of-truth database – cache coherence – Search queries – message bus
• Abstract Connection pool management
• Provide a scalable way to query across shards with full control of Solr schema
• And to analyze big data without affecting real-time systems and isolating individual data domains
20
Lessons learned
A Search engine like Apache Solr is… • not limited to search-based business applications.
• a first class citizen in your persistence technology stack; it complements the SoT database.
• easy to adopt and has all of us as community for support.
21
The Future
• Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big Data systems
• Walmart is committed to be part of the community building it
22
Questions? Reach us at:
• You can reach me, Ayon Sinha, at:
– [email protected] – https://www.linkedin.com/in/ayonsinha
• Jason Sardina, our Lead Persistence Architect – [email protected]
• @WalmartLabs is always hiring the best